Gaia V RAR Zenodo Time Stamp

Oscillatory Structure in Galaxy Rotation Curves and Its Consistency with a Gaia-Informed Acceleration Field

Author: Paul Cartwright
Date: [26 04 2026]

⸻

Abstract

We report the presence of coherent, non-random residual structure in galaxy rotation curves derived from the SPARC dataset, under single parameter modelling. Across this subset sample of 54 galaxies, residuals exhibit repeated sign changes and radial coherence inconsistent with purely stochastic noise. Two rotation curve fits from our legacy name (DCT) on M31 and The Milky Way were also used for analysis.

Motivated by this structure, we test whether an externally informed oscillatory acceleration field—derived from the Gaia “Great Wave” phenomenology—can reproduce the required correction between baryonic predictions and observed rotation curves.

We show:

	•	The required acceleration correction (g_obs − g_bar) is closely matched by a smooth Gaia-derived envelope
	•	The resulting total acceleration reproduces the Radial Acceleration Relation (RAR) behaviour
	•	This agreement emerges without tuning per galaxy, using a shared functional form

These results suggest that the empirical RAR may arise as a smoothed envelope of an underlying structured (oscillatory) acceleration field.

⸻

1. Motivation

Galaxy rotation curves exhibit systematic deviations from baryonic predictions. These are typically described via:

	•	Dark matter halo models
	•	Empirical relations such as the Radial Acceleration Relation (RAR)

However, simple one parameter models applied to SPARC galaxies and 2 legacy DCT fits on M31 and The Milky Way reveal a consistent feature:

Residuals are not random. They are structured.

This work investigates whether that structure contains physical information.

⸻

2. Residual Structure in One-Parameter Fits

Using the previously developed single-parameter model (CTD-1P), we analyse residuals:

	•	Residual = v_obs − v_model
	•	Across radius for each galaxy

Observations:

Across 54 example SPARC galaxies, M31 and MW, 

	•	Residuals show multiple zero crossings
	•	Coherent radial structure (~kpc scale)
	•	Similar qualitative shapes across independent systems

This behaviour is inconsistent with:

	•	Pure measurement noise
	•	Random scatter

and instead suggests:

An underlying structured (possibly oscillatory) component in the acceleration field

⸻

See figures 1-56 

⸻

3. Hypothesis

We test the following working hypothesis:

The missing acceleration in galaxy rotation curves is not purely static, but contains a structured component consistent with an oscillatory field.

As an initial proxy for such a field, we use:

A smooth envelope derived from Gaia “Great Wave”-like behaviour

This is not assumed to be exact for all galaxies, but serves as:

	•	A physically motivated template
	•	A universal test function

⸻

4. Method Overview

For each galaxy:

	1.	Compute baryonic acceleration:
    g_bar = v_bar² / r
	1.	Compute observed acceleration:
    g_obs = v_obs² / r
	1.	Define required correction:
    Δg_required = g_obs − g_bar
	1.	Construct Gaia-inspired support term:
    Δg_Gaia(r)
	1.	Compare:

	•	Δg_required vs Δg_Gaia
	•	Total acceleration vs RAR
	•	Behaviour in RAR plane

⸻

5. Core Results

5.1 Required Acceleration vs Gaia Support

Across tested galaxies:

	•	The Gaia-inspired term closely follows the required correction
	•	Particularly strong agreement in “riser” galaxies

This indicates:

The missing acceleration has a radial structure consistent with a smooth oscillatory envelope

⸻

5.2 Velocity Curve Reconstruction

Including the Gaia term:

v_total = sqrt(v_bar² + v_Gaia²)

Produces:

	•	Close agreement with observed rotation curves
	•	Comparable behaviour to RAR-based reconstruction

⸻

5.3 Behaviour on the RAR Plane

We compare:

	•	Observed points
	•	RAR prediction
	•	Gaia-based reconstruction

Result:

The Gaia-informed model traces the RAR trend closely across tested galaxies.

This suggests:

The RAR may represent a smoothed projection of a more structured underlying field.

⸻

6. Interpretation

This work does not claim a complete physical model of galaxy dynamics.

Instead, it demonstrates:

Key Insight:

A structured acceleration field can reproduce both:

	•	Rotation curve behaviour
	•	RAR scaling

without requiring per-galaxy tuning.

⸻

Conceptual Picture

	•	Underlying field: structured / oscillatory
	•	Observations: partially averaged → smooth relation (RAR)
	•	Residuals: reveal hidden structure

⸻

7. Limitations

	•	Gaia-derived envelope used as a proxy, not a measured field for each galaxy
	•	Performance varies for “tail-dominated” / complex galaxies
	•	No full statistical fitting or parameter optimisation performed here
	•	Graph scaling and presentation not yet standardised for publication

⸻

8. Conclusion

We find that:

	1.	SPARC, M31 and The Milky Way galaxies residuals exhibit coherent structure
	2.	A Gaia-inspired acceleration field reproduces required corrections
	3.	This field naturally traces the RAR relation

This supports the possibility that:

The RAR emerges from an underlying structured acceleration field rather than a purely static relation.

⸻

9. Python Codes, reproducibility and output.   

9.1 Gaia vs Sofue Comparison
Python Code 

This first cell requires two files to be uploaded - 
milky_way_sofue_v15_residuals (CTD rotation curve residuals from v1.5 fit)
and 
poggio_source_ids (downloaded straight from Gaia) 

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import pandas as pd
import zipfile
import os
from google.colab import files

# ------------------------------------------------------------
# Raw Sofue (2020) Milky Way Rotation Curve Data
# ------------------------------------------------------------
r_raw = np.array([
    0.20, 0.50, 1.00, 1.50, 2.00, 2.50, 3.00, 3.50, 4.00,
    4.50, 5.00, 5.50, 6.00, 6.50, 7.00, 7.50, 8.00, 8.50,
    9.00, 9.50, 10.00, 11.00, 12.00, 13.00, 14.00, 15.00,
    16.00, 17.00, 18.00, 19.00, 20.00, 22.00, 24.00, 26.00,
    28.00, 30.00
])

v_raw = np.array([
    60.5, 90.2, 112.3, 140.1, 162.5, 178.3, 188.7, 197.2, 204.5,
    212.1, 219.0, 224.6, 229.3, 232.5, 234.7, 236.1, 237.0, 236.5,
    235.4, 234.0, 232.1, 228.5, 224.3, 219.2, 214.1, 209.2,
    204.0, 199.5, 195.0, 190.2, 185.6, 180.3, 175.4, 170.5,
    166.2, 162.1
])

v_err_raw = np.array([
    8.2, 7.5, 6.8, 6.2, 5.9, 5.4, 5.0, 4.8, 4.5,
    4.2, 4.0, 3.8, 3.6, 3.4, 3.2, 3.1, 3.0, 3.0,
    3.2, 3.4, 3.6, 3.8, 4.0, 4.3, 4.6, 5.0,
    5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0,
    9.5, 10.0
])

# ------------------------------------------------------------
# DCT v1.5 Model: Core Rise + Shifted Breathing Bump
# ------------------------------------------------------------
def dct_model_v1_5(r, V0, r0, Vc, rc, sigma):
    rise = V0 * (1.0 - np.exp(-r / r0))
    core = Vc * np.exp(-((r - rc) ** 2) / (sigma ** 2))
    return rise + core

# ------------------------------------------------------------
# Initial Guesses and Parameter Bounds
# ------------------------------------------------------------
initial_guess = [220, 5, 60, 2, 2]  # V0, r0, Vc, rc, sigma
bounds = ([0, 0.1, 0, 0.1, 0.1], [400, 50, 400, 10, 20])

# ------------------------------------------------------------
# Fit the Model
# ------------------------------------------------------------
popt, pcov = curve_fit(
    dct_model_v1_5,
    r_raw,
    v_raw,
    p0=initial_guess,
    bounds=bounds,
    sigma=v_err_raw,
    absolute_sigma=True,
    maxfev=30000
)

V0, r0, Vc, rc, sigma = popt
perr = np.sqrt(np.diag(pcov))

# ------------------------------------------------------------
# Generate Fit and Residuals
# ------------------------------------------------------------
v_fit = dct_model_v1_5(r_raw, *popt)
residuals = v_raw - v_fit
std_residuals = residuals / v_err_raw

# ------------------------------------------------------------
# Save residual table to CSV
# ------------------------------------------------------------
table = pd.DataFrame({
    "radius_kpc": r_raw,
    "v_obs_kms": v_raw,
    "v_err_kms": v_err_raw,
    "v_fit_kms": v_fit,
    "residual_kms": residuals,
    "std_residual": std_residuals
})

csv_name = "milky_way_sofue_v15_residuals.csv"
table.to_csv(csv_name, index=False)

# ------------------------------------------------------------
# Print residual table to screen
# ------------------------------------------------------------
pd.set_option("display.max_rows", None)
pd.set_option("display.width", 140)
pd.set_option("display.float_format", lambda x: f"{x:.6f}")

print("\n=== Residual Table ===")
print(table.to_string(index=False))

# ------------------------------------------------------------
# Plot fit and residuals
# ------------------------------------------------------------
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 7), sharex=True)

# Velocity curve
ax1.errorbar(r_raw, v_raw, yerr=v_err_raw, fmt='ko', label="Observed (Sofue 2020)")
ax1.plot(r_raw, v_fit, color='orange', linewidth=2, label="DCT v1.5 Fit")
ax1.set_ylabel("Velocity [km/s]")
ax1.set_title("Milky Way – DCT v1.5 Fit (Raw Sofue 2020 Data)")
ax1.legend()
ax1.grid(True)

# Residuals
ax2.bar(r_raw, residuals, width=0.5, color='orangered')
ax2.axhline(0, linestyle='--', color='gray')
ax2.set_xlabel("Radius [kpc]")
ax2.set_ylabel("Residuals [km/s]")
ax2.grid(True)

plt.tight_layout()

png_name = "milky_way_sofue_v15_fit.png"
plt.savefig(png_name, dpi=300, bbox_inches="tight")
plt.show()

# ------------------------------------------------------------
# Output fitted parameters
# ------------------------------------------------------------
print("\nFitted DCT v1.5 Parameters for Milky Way:")
print(f"V0     = {V0:.4f} ± {perr[0]:.4f} km/s")
print(f"r0     = {r0:.4f} ± {perr[1]:.4f} kpc")
print(f"Vc     = {Vc:.4f} ± {perr[2]:.4f} km/s")
print(f"rc     = {rc:.4f} ± {perr[3]:.4f} kpc")
print(f"sigma  = {sigma:.4f} ± {perr[4]:.4f} kpc")

# ------------------------------------------------------------
# Check files exist
# ------------------------------------------------------------
print("\nFile existence check:")
print(f"{csv_name}: {os.path.exists(csv_name)} | size = {os.path.getsize(csv_name) if os.path.exists(csv_name) else 0} bytes")
print(f"{png_name}: {os.path.exists(png_name)} | size = {os.path.getsize(png_name) if os.path.exists(png_name) else 0} bytes")

# ------------------------------------------------------------
# Bundle both outputs into one ZIP for mobile-friendly download
# ------------------------------------------------------------
zip_name = "milky_way_sofue_outputs.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    zf.write(csv_name)
    zf.write(png_name)

print(f"\nSaved CSV: {csv_name}")
print(f"Saved PNG: {png_name}")
print(f"Saved ZIP: {zip_name}")

# ------------------------------------------------------------
# Force single download
# ------------------------------------------------------------
files.download(zip_name)

✅✅✅
File output 

Saving poggio_source_ids.csv to poggio_source_ids.csv
Saving milky_way_sofue_v15_residuals.csv to milky_way_sofue_v15_residuals.csv

Using Poggio file: poggio_source_ids.csv
Using Sofue file:  milky_way_sofue_v15_residuals.csv

Poggio columns: ['143146797507004160']
Detected source ID column: 143146797507004160
Total unique Gaia source IDs: 17029

Querying Gaia DR3 in 35 chunks of up to 500 IDs each...
This can take a while. Don't worry if it sits for a bit.
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 1/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 2/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 3/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 4/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 5/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 6/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 7/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 8/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 9/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 10/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 11/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 12/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 13/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 14/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 15/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 16/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 17/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 18/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 19/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 20/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 21/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 22/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 23/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 24/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 25/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 26/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 27/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 28/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 29/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 30/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 31/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 32/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 33/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 34/35 -> 500 rows
INFO:astroquery:Query finished.
INFO: Query finished. [astroquery.utils.tap.core]
  Chunk 35/35 -> 29 rows

Total Gaia rows returned: 17029
Saved raw Gaia query results: gaia_poggio_query_results.csv
Gaia rows after cleaning: 15651
Galactocentric radius range: 1.17 to 34.83 kpc
Saved cleaned Gaia data: gaia_poggio_clean_galactocentric.csv
Saved Gaia wave profile: gaia_wave_profile_binned.csv
Saved simplified Sofue residuals: sofue_residuals_simple.csv

Saved comparison grid: gaia_vs_sofue_common_grid.csv

Best lag: 10.82 kpc

Sofue peak radii:   [ 8.02 12.   18.   22.05]
Sofue trough radii: [10.98 12.93 20.03]
Gaia peak radii:    [ 6.5   8.45 11.49 19.52 23.49 25.52]
Gaia trough radii:  [ 5.49  7.52  9.46 18.5  20.53 24.51 28.48]

Done.
Files saved:
 - gaia_poggio_query_results.csv
 - gaia_poggio_clean_galactocentric.csv
 - gaia_wave_profile_binned.csv
 - sofue_residuals_simple.csv
 - gaia_vs_sofue_common_grid.csv

⸻
✅✅✅ 

9.2 Cell 01's Extraction Cell

This cell extracts the visuals from cell 01 (see figures 57-60)

# ============================================================
# EXTRACTOR CELL 1 (REBUILD FIGURES FROM SAVED CSV OUTPUTS)
# Run AFTER the first working Gaia/Sofue analysis cell.
# Does NOT modify the original science cell.
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.signal import correlate, find_peaks
from google.colab import files
from IPython import get_ipython

# ------------------------------------------------------------
# File names expected from the first working analysis cell
# ------------------------------------------------------------
base_name = "gaia_sofue_analysis_cell1"

py_name  = f"{base_name}.py"
zip_name = f"{base_name}_bundle.zip"

csv_files = [
    "gaia_poggio_query_results.csv",
    "gaia_poggio_clean_galactocentric.csv",
    "gaia_wave_profile_binned.csv",
    "sofue_residuals_simple.csv",
    "gaia_vs_sofue_common_grid.csv"
]

required_for_figs = [
    "gaia_wave_profile_binned.csv",
    "sofue_residuals_simple.csv",
    "gaia_vs_sofue_common_grid.csv"
]

# ------------------------------------------------------------
# Check files exist
# ------------------------------------------------------------
missing = [fn for fn in required_for_figs if not os.path.exists(fn)]
if missing:
    raise FileNotFoundError(f"Missing required CSV(s) for rebuilding figures: {missing}")

# ------------------------------------------------------------
# 1) Save previous cell code as .py
# ------------------------------------------------------------
ip = get_ipython()
input_history = ip.user_ns.get("In", [])

if len(input_history) >= 2:
    previous_cell_code = input_history[-2]
else:
    previous_cell_code = "# Previous cell code not found."

with open(py_name, "w", encoding="utf-8") as f:
    f.write(previous_cell_code)

# ------------------------------------------------------------
# 2) Read saved outputs
# ------------------------------------------------------------
prof = pd.read_csv("gaia_wave_profile_binned.csv")
sofue = pd.read_csv("sofue_residuals_simple.csv")
cmpdf = pd.read_csv("gaia_vs_sofue_common_grid.csv")

# Pull arrays
common_r = cmpdf["radius_kpc"].values
sofue_i  = cmpdf["sofue_residual_interp"].values
gaia_i   = cmpdf["gaia_z_interp"].values
sofue_n  = cmpdf["sofue_norm"].values
gaia_n   = cmpdf["gaia_norm"].values

# Cross-correlation recompute
corr = correlate(sofue_n, gaia_n, mode="full")
lags = np.arange(-len(common_r) + 1, len(common_r))
lag_kpc = lags * (common_r[1] - common_r[0])
best_idx = np.argmax(corr)
best_lag = lag_kpc[best_idx]

# Peaks/troughs recompute
sofue_peaks, _   = find_peaks(sofue_n, distance=20)
sofue_troughs, _ = find_peaks(-sofue_n, distance=20)
gaia_peaks, _    = find_peaks(gaia_n, distance=20)
gaia_troughs, _  = find_peaks(-gaia_n, distance=20)

saved_pngs = []
saved_pdfs = []

def save_current(fig_num):
    png_name = f"{base_name}_fig{fig_num}.png"
    pdf_name = f"{base_name}_fig{fig_num}.pdf"
    plt.savefig(png_name, dpi=300, bbox_inches="tight")
    plt.savefig(pdf_name, bbox_inches="tight")
    saved_pngs.append(png_name)
    saved_pdfs.append(pdf_name)

# ------------------------------------------------------------
# FIGURE 1: Gaia wave profile + Sofue residuals
# ------------------------------------------------------------
plt.figure(figsize=(11, 8))

ax1 = plt.subplot(2,1,1)
ax1.errorbar(prof["R_mean"], prof["z_mean"], yerr=prof["z_sem"], fmt='o-', capsize=3)
ax1.axhline(0, ls='--', c='gray')
ax1.set_xlabel("Galactocentric Radius [kpc]")
ax1.set_ylabel("Mean z [kpc]")
ax1.set_title("Gaia Great Wave Proxy: mean vertical displacement vs radius")
ax1.grid(True)

ax2 = plt.subplot(2,1,2)
ax2.bar(sofue["radius_kpc"], sofue["residual_kms"], width=0.5)
ax2.axhline(0, ls='--', c='gray')
ax2.set_xlabel("Radius [kpc]")
ax2.set_ylabel("Residual [km/s]")
ax2.set_title("Milky Way Sofue residuals")
ax2.grid(True)

plt.tight_layout()
save_current(1)
plt.show()

# ------------------------------------------------------------
# FIGURE 2: Normalized overlay
# ------------------------------------------------------------
plt.figure(figsize=(11,5))
plt.plot(common_r, sofue_n, label="Sofue residuals (normalized)")
plt.plot(common_r, gaia_n, label="Gaia wave proxy z(R) (normalized)")
plt.axhline(0, ls='--', c='gray')
plt.xlabel("Radius [kpc]")
plt.ylabel("Normalized amplitude")
plt.title("First comparison: Sofue wiggles vs Gaia Great Wave proxy")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_current(2)
plt.show()

# ------------------------------------------------------------
# FIGURE 3: Cross-correlation
# ------------------------------------------------------------
plt.figure(figsize=(10,4))
plt.plot(lag_kpc, corr)
plt.axvline(best_lag, color='red', ls='--', label=f"Best lag = {best_lag:.2f} kpc")
plt.xlabel("Lag [kpc]")
plt.ylabel("Cross-correlation")
plt.title("Sofue residuals vs Gaia wave proxy cross-correlation")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_current(3)
plt.show()

# ------------------------------------------------------------
# FIGURE 4: Peaks and troughs
# ------------------------------------------------------------
plt.figure(figsize=(11,5))
plt.plot(common_r, sofue_n, label="Sofue residuals")
plt.plot(common_r, gaia_n, label="Gaia wave proxy")

plt.scatter(common_r[sofue_peaks], sofue_n[sofue_peaks], marker='^', s=80, label="Sofue peaks")
plt.scatter(common_r[sofue_troughs], sofue_n[sofue_troughs], marker='v', s=80, label="Sofue troughs")
plt.scatter(common_r[gaia_peaks], gaia_n[gaia_peaks], marker='^', s=80, label="Gaia peaks")
plt.scatter(common_r[gaia_troughs], gaia_n[gaia_troughs], marker='v', s=80, label="Gaia troughs")

plt.axhline(0, ls='--', c='gray')
plt.xlabel("Radius [kpc]")
plt.ylabel("Normalized amplitude")
plt.title("Peaks and troughs: Sofue vs Gaia")
plt.legend(ncol=2)
plt.grid(True)
plt.tight_layout()
save_current(4)
plt.show()

# ------------------------------------------------------------
# 3) Build ZIP bundle
# ------------------------------------------------------------
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    if os.path.exists(py_name):
        zf.write(py_name)

    for fn in saved_pngs + saved_pdfs:
        if os.path.exists(fn):
            zf.write(fn)

    for fn in csv_files:
        if os.path.exists(fn):
            zf.write(fn)

# ------------------------------------------------------------
# 4) Print saved files
# ------------------------------------------------------------
print("\nSaved files:")
for fn in [py_name] + saved_pngs + saved_pdfs + csv_files + [zip_name]:
    if os.path.exists(fn):
        print(" -", fn)
    else:
        print(" - MISSING:", fn)

# ------------------------------------------------------------
# 5) Download
# ------------------------------------------------------------
files.download(zip_name)

⸻

9.3 Phase Consistency and Lag Robustness

# ============================================================
# FOLLOW-ON TEST CELL:
# Phase consistency + lag robustness for Gaia vs Sofue
# Upload if needed:
#   1) gaia_poggio_clean_galactocentric.csv
#   2) sofue_residuals_simple.csv
# ============================================================

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.signal import correlate
from scipy.ndimage import gaussian_filter1d

# Optional upload block if files are not already in session
try:
    gaia = pd.read_csv("gaia_poggio_clean_galactocentric.csv")
    sofue = pd.read_csv("sofue_residuals_simple.csv")
    print("Loaded local files from session.")
except:
    from google.colab import files
    print("Upload these two files now:")
    print(" - gaia_poggio_clean_galactocentric.csv")
    print(" - sofue_residuals_simple.csv")
    uploaded = files.upload()

    gaia_file = None
    sofue_file = None
    for fn in uploaded.keys():
        low = fn.lower()
        if "gaia_poggio_clean_galactocentric" in low:
            gaia_file = fn
        if "sofue_residuals_simple" in low:
            sofue_file = fn

    if gaia_file is None or sofue_file is None:
        raise ValueError("Could not identify one or both uploaded files.")

    gaia = pd.read_csv(gaia_file)
    sofue = pd.read_csv(sofue_file)

print("\nGaia columns:", gaia.columns.tolist())
print("Sofue columns:", sofue.columns.tolist())

# ------------------------------------------------------------
# Helpers
# ------------------------------------------------------------
def build_gaia_profile(gaia_df, bin_width=1.0, min_count=10):
    df = gaia_df.copy()
    rmin = max(0.0, np.floor(df["R_gc_kpc"].min()))
    rmax = min(35.0, np.ceil(df["R_gc_kpc"].max()))
    bins = np.arange(rmin, rmax + bin_width, bin_width)
    df["R_bin"] = pd.cut(df["R_gc_kpc"], bins=bins)

    prof = df.groupby("R_bin", observed=True).agg(
        R_mean=("R_gc_kpc", "mean"),
        z_mean=("z_kpc", "mean"),
        z_std=("z_kpc", "std"),
        N=("z_kpc", "size")
    ).reset_index(drop=True)

    prof["z_sem"] = prof["z_std"] / np.sqrt(prof["N"])
    prof = prof[prof["N"] >= min_count].copy()
    return prof

def interpolate_common(sofue_df, prof_df, r_lo=None, r_hi=None, npts=400):
    if r_lo is None:
        r_lo = max(float(sofue_df["radius_kpc"].min()), float(prof_df["R_mean"].min()))
    if r_hi is None:
        r_hi = min(float(sofue_df["radius_kpc"].max()), float(prof_df["R_mean"].max()))
    common_r = np.linspace(r_lo, r_hi, npts)

    f_sof = interp1d(
        sofue_df["radius_kpc"], sofue_df["residual_kms"],
        kind="linear", fill_value="extrapolate"
    )
    f_gai = interp1d(
        prof_df["R_mean"], prof_df["z_mean"],
        kind="linear", fill_value="extrapolate"
    )

    sof = f_sof(common_r)
    gai = f_gai(common_r)
    return common_r, sof, gai

def normalize(x):
    x = np.asarray(x)
    s = np.std(x)
    if s == 0:
        return x * 0.0
    return (x - np.mean(x)) / s

def dominant_wavelength(x, y):
    """
    Very simple FFT-based dominant wavelength estimate.
    x must be evenly spaced.
    """
    x = np.asarray(x)
    y = np.asarray(y)
    dx = x[1] - x[0]
    y0 = y - np.mean(y)

    fft = np.fft.rfft(y0)
    freqs = np.fft.rfftfreq(len(y0), d=dx)

    # Ignore zero frequency
    power = np.abs(fft) ** 2
    if len(power) < 3:
        return np.nan, freqs, power

    power[0] = 0.0

    # Find strongest frequency
    idx = np.argmax(power)
    f_dom = freqs[idx]

    if f_dom <= 0:
        return np.nan, freqs, power

    wavelength = 1.0 / f_dom
    return wavelength, freqs, power

def best_lag_kpc(x, y, r_grid):
    xn = normalize(x)
    yn = normalize(y)
    corr = correlate(xn, yn, mode="full")
    lags = np.arange(-len(r_grid) + 1, len(r_grid))
    lag_kpc = lags * (r_grid[1] - r_grid[0])
    idx = np.argmax(corr)
    return lag_kpc[idx], corr, lag_kpc

# ------------------------------------------------------------
# 1) Baseline with 1 kpc Gaia bins
# ------------------------------------------------------------
prof_1 = build_gaia_profile(gaia, bin_width=1.0, min_count=10)
common_r, sof_i, gai_i = interpolate_common(sofue, prof_1, npts=500)

# Smoothed versions for large-scale comparison
# sigma is in samples, not kpc
sof_s = gaussian_filter1d(sof_i, sigma=6)
gai_s = gaussian_filter1d(gai_i, sigma=6)

sof_n = normalize(sof_i)
gai_n = normalize(gai_i)
sof_sn = normalize(sof_s)
gai_sn = normalize(gai_s)

# Dominant wavelengths
lam_sof_raw, f_sof_raw, p_sof_raw = dominant_wavelength(common_r, sof_i)
lam_gai_raw, f_gai_raw, p_gai_raw = dominant_wavelength(common_r, gai_i)
lam_sof_s, f_sof_s, p_sof_s = dominant_wavelength(common_r, sof_s)
lam_gai_s, f_gai_s, p_gai_s = dominant_wavelength(common_r, gai_s)

# Best lag
lag_raw, corr_raw, lag_axis_raw = best_lag_kpc(sof_i, gai_i, common_r)
lag_s, corr_s, lag_axis_s = best_lag_kpc(sof_s, gai_s, common_r)

print("\n=== Baseline results (Gaia bin width = 1.0 kpc) ===")
print(f"Dominant wavelength, Sofue raw:     {lam_sof_raw:.2f} kpc")
print(f"Dominant wavelength, Gaia raw:      {lam_gai_raw:.2f} kpc")
print(f"Dominant wavelength, Sofue smoothed:{lam_sof_s:.2f} kpc")
print(f"Dominant wavelength, Gaia smoothed: {lam_gai_s:.2f} kpc")
print(f"Best lag, raw signals:              {lag_raw:.2f} kpc")
print(f"Best lag, smoothed signals:         {lag_s:.2f} kpc")

# ------------------------------------------------------------
# 2) Plots
# ------------------------------------------------------------
fig, axes = plt.subplots(2, 2, figsize=(13, 9))

# Raw normalized overlay
axes[0,0].plot(common_r, sof_n, label="Sofue raw (norm)")
axes[0,0].plot(common_r, gai_n, label="Gaia raw (norm)")
axes[0,0].axhline(0, ls="--", c="gray")
axes[0,0].set_title("Raw normalized signals")
axes[0,0].set_xlabel("Radius [kpc]")
axes[0,0].set_ylabel("Amplitude")
axes[0,0].legend()
axes[0,0].grid(True)

# Smoothed normalized overlay
axes[0,1].plot(common_r, sof_sn, label="Sofue smoothed (norm)")
axes[0,1].plot(common_r, gai_sn, label="Gaia smoothed (norm)")
axes[0,1].axhline(0, ls="--", c="gray")
axes[0,1].set_title("Smoothed normalized signals")
axes[0,1].set_xlabel("Radius [kpc]")
axes[0,1].set_ylabel("Amplitude")
axes[0,1].legend()
axes[0,1].grid(True)

# Raw cross-correlation
axes[1,0].plot(lag_axis_raw, corr_raw)
axes[1,0].axvline(lag_raw, color="red", ls="--", label=f"Best lag = {lag_raw:.2f} kpc")
axes[1,0].set_title("Cross-correlation (raw)")
axes[1,0].set_xlabel("Lag [kpc]")
axes[1,0].set_ylabel("Correlation")
axes[1,0].legend()
axes[1,0].grid(True)

# Smoothed cross-correlation
axes[1,1].plot(lag_axis_s, corr_s)
axes[1,1].axvline(lag_s, color="red", ls="--", label=f"Best lag = {lag_s:.2f} kpc")
axes[1,1].set_title("Cross-correlation (smoothed)")
axes[1,1].set_xlabel("Lag [kpc]")
axes[1,1].set_ylabel("Correlation")
axes[1,1].legend()
axes[1,1].grid(True)

plt.tight_layout()
plt.show()

# ------------------------------------------------------------
# 3) Power spectra
# ------------------------------------------------------------
def plot_power(freqs, power, title):
    mask = freqs > 0
    wl = 1.0 / freqs[mask]
    pw = power[mask]
    idx = np.argsort(wl)
    wl = wl[idx]
    pw = pw[idx]

    plt.figure(figsize=(8,4))
    plt.plot(wl, pw)
    plt.xlim(0, min(40, np.nanmax(wl)))
    plt.xlabel("Wavelength [kpc]")
    plt.ylabel("Power")
    plt.title(title)
    plt.grid(True)
    plt.show()

plot_power(f_sof_raw, p_sof_raw, "Sofue raw: power vs wavelength")
plot_power(f_gai_raw, p_gai_raw, "Gaia raw: power vs wavelength")
plot_power(f_sof_s, p_sof_s, "Sofue smoothed: power vs wavelength")
plot_power(f_gai_s, p_gai_s, "Gaia smoothed: power vs wavelength")

# ------------------------------------------------------------
# 4) Lag robustness under different Gaia bin widths
# ------------------------------------------------------------
bin_widths = [0.75, 1.0, 1.25, 1.5, 2.0]
rows = []

for bw in bin_widths:
    prof = build_gaia_profile(gaia, bin_width=bw, min_count=10)
    if len(prof) < 5:
        continue

    c_r, s_i, g_i = interpolate_common(sofue, prof, npts=500)
    s_sm = gaussian_filter1d(s_i, sigma=6)
    g_sm = gaussian_filter1d(g_i, sigma=6)

    lam_s, _, _ = dominant_wavelength(c_r, s_sm)
    lam_g, _, _ = dominant_wavelength(c_r, g_sm)
    lag_bw, _, _ = best_lag_kpc(s_sm, g_sm, c_r)

    rows.append({
        "gaia_bin_width_kpc": bw,
        "n_bins": len(prof),
        "best_lag_kpc": lag_bw,
        "sofue_dom_wavelength_kpc": lam_s,
        "gaia_dom_wavelength_kpc": lam_g
    })

robust = pd.DataFrame(rows)
print("\n=== Lag robustness table ===")
print(robust.to_string(index=False))

plt.figure(figsize=(8,4))
plt.plot(robust["gaia_bin_width_kpc"], robust["best_lag_kpc"], marker="o")
plt.axhline(np.mean(robust["best_lag_kpc"]), color="red", ls="--",
            label=f"Mean lag = {np.mean(robust['best_lag_kpc']):.2f} kpc")
plt.xlabel("Gaia bin width [kpc]")
plt.ylabel("Best lag [kpc]")
plt.title("Lag robustness vs Gaia radial bin width")
plt.legend()
plt.grid(True)
plt.show()

# ------------------------------------------------------------
# 5) Save summary
# ------------------------------------------------------------
summary = {
    "baseline_lag_raw_kpc": lag_raw,
    "baseline_lag_smoothed_kpc": lag_s,
    "baseline_sofue_dom_wavelength_raw_kpc": lam_sof_raw,
    "baseline_gaia_dom_wavelength_raw_kpc": lam_gai_raw,
    "baseline_sofue_dom_wavelength_smoothed_kpc": lam_sof_s,
    "baseline_gaia_dom_wavelength_smoothed_kpc": lam_gai_s,
    "lag_mean_over_binwidths_kpc": float(np.mean(robust["best_lag_kpc"])) if len(robust) else np.nan,
    "lag_std_over_binwidths_kpc": float(np.std(robust["best_lag_kpc"])) if len(robust) else np.nan,
}

summary_df = pd.DataFrame([summary])
summary_df.to_csv("gaia_sofue_phase_scale_summary.csv", index=False)
robust.to_csv("gaia_sofue_lag_robustness.csv", index=False)

print("\nSaved:")
print(" - gaia_sofue_phase_scale_summary.csv")
print(" - gaia_sofue_lag_robustness.csv")
print("\nDone.")

✅✅✅ 

Cell output 

Loaded local files from session.

Gaia columns: ['source_id', 'ra', 'dec', 'l', 'b', 'parallax', 'parallax_error', 'pmra', 'pmdec', 'radial_velocity', 'radial_velocity_error', 'd_kpc', 'x_kpc', 'y_kpc', 'z_kpc', 'R_gc_kpc']
Sofue columns: ['radius_kpc', 'residual_kms']

=== Baseline results (Gaia bin width = 1.0 kpc) ===
Dominant wavelength, Sofue raw:     25.32 kpc
Dominant wavelength, Gaia raw:      25.32 kpc
Dominant wavelength, Sofue smoothed:25.32 kpc
Dominant wavelength, Gaia smoothed: 25.32 kpc
Best lag, raw signals:              10.79 kpc
Best lag, smoothed signals:         10.69 kpc

=== Lag robustness table ===
 gaia_bin_width_kpc  n_bins  best_lag_kpc  sofue_dom_wavelength_kpc  gaia_dom_wavelength_kpc
               0.75      41     10.508871                 25.506968                25.506968
               1.00      31     10.686997                 25.324636                25.324636
               1.25      25     10.207541                 25.518852                25.518852
               1.50      21     10.315175                 25.036832                25.036832
               2.00      16     10.388250                 25.337196                25.337196

Saved:
 - gaia_sofue_phase_scale_summary.csv
 - gaia_sofue_lag_robustness.csv

Done.

⸻

✅✅✅ 

9.4 Cell 03's Extraction Cell (see figures 61-67)

# ============================================================
# EXTRACTOR CELL 2 (REBUILD FIGURES FROM PHASE/LAG ANALYSIS)
# Run AFTER the second working phase/lag/robustness cell.
# Does NOT modify the original science cell.
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
from scipy.signal import correlate
from scipy.ndimage import gaussian_filter1d
from google.colab import files
from IPython import get_ipython

# ------------------------------------------------------------
# File names expected from the working phase/lag cell
# ------------------------------------------------------------
base_name = "gaia_sofue_analysis_cell2"

py_name  = f"{base_name}.py"
zip_name = f"{base_name}_bundle.zip"

csv_files = [
    "gaia_poggio_clean_galactocentric.csv",
    "sofue_residuals_simple.csv",
    "gaia_sofue_phase_scale_summary.csv",
    "gaia_sofue_lag_robustness.csv"
]

required_for_rebuild = [
    "gaia_poggio_clean_galactocentric.csv",
    "sofue_residuals_simple.csv",
    "gaia_sofue_phase_scale_summary.csv",
    "gaia_sofue_lag_robustness.csv"
]

missing = [fn for fn in required_for_rebuild if not os.path.exists(fn)]
if missing:
    raise FileNotFoundError(f"Missing required CSV(s): {missing}")

# ------------------------------------------------------------
# 1) Save previous cell code as .py
# ------------------------------------------------------------
ip = get_ipython()
input_history = ip.user_ns.get("In", [])

if len(input_history) >= 2:
    previous_cell_code = input_history[-2]
else:
    previous_cell_code = "# Previous cell code not found."

with open(py_name, "w", encoding="utf-8") as f:
    f.write(previous_cell_code)

# ------------------------------------------------------------
# 2) Load saved data
# ------------------------------------------------------------
gaia = pd.read_csv("gaia_poggio_clean_galactocentric.csv")
sofue = pd.read_csv("sofue_residuals_simple.csv")
summary_df = pd.read_csv("gaia_sofue_phase_scale_summary.csv")
robust = pd.read_csv("gaia_sofue_lag_robustness.csv")

# ------------------------------------------------------------
# Helpers (same logic as working cell)
# ------------------------------------------------------------
def build_gaia_profile(gaia_df, bin_width=1.0, min_count=10):
    df = gaia_df.copy()
    rmin = max(0.0, np.floor(df["R_gc_kpc"].min()))
    rmax = min(35.0, np.ceil(df["R_gc_kpc"].max()))
    bins = np.arange(rmin, rmax + bin_width, bin_width)
    df["R_bin"] = pd.cut(df["R_gc_kpc"], bins=bins)

    prof = df.groupby("R_bin", observed=True).agg(
        R_mean=("R_gc_kpc", "mean"),
        z_mean=("z_kpc", "mean"),
        z_std=("z_kpc", "std"),
        N=("z_kpc", "size")
    ).reset_index(drop=True)

    prof["z_sem"] = prof["z_std"] / np.sqrt(prof["N"])
    prof = prof[prof["N"] >= min_count].copy()
    return prof

def interpolate_common(sofue_df, prof_df, r_lo=None, r_hi=None, npts=400):
    if r_lo is None:
        r_lo = max(float(sofue_df["radius_kpc"].min()), float(prof_df["R_mean"].min()))
    if r_hi is None:
        r_hi = min(float(sofue_df["radius_kpc"].max()), float(prof_df["R_mean"].max()))
    common_r = np.linspace(r_lo, r_hi, npts)

    f_sof = interp1d(
        sofue_df["radius_kpc"], sofue_df["residual_kms"],
        kind="linear", fill_value="extrapolate"
    )
    f_gai = interp1d(
        prof_df["R_mean"], prof_df["z_mean"],
        kind="linear", fill_value="extrapolate"
    )

    sof = f_sof(common_r)
    gai = f_gai(common_r)
    return common_r, sof, gai

def normalize(x):
    x = np.asarray(x)
    s = np.std(x)
    if s == 0:
        return x * 0.0
    return (x - np.mean(x)) / s

def dominant_wavelength(x, y):
    x = np.asarray(x)
    y = np.asarray(y)
    dx = x[1] - x[0]
    y0 = y - np.mean(y)

    fft = np.fft.rfft(y0)
    freqs = np.fft.rfftfreq(len(y0), d=dx)

    power = np.abs(fft) ** 2
    if len(power) < 3:
        return np.nan, freqs, power

    power[0] = 0.0
    idx = np.argmax(power)
    f_dom = freqs[idx]

    if f_dom <= 0:
        return np.nan, freqs, power

    wavelength = 1.0 / f_dom
    return wavelength, freqs, power

def best_lag_kpc(x, y, r_grid):
    xn = normalize(x)
    yn = normalize(y)
    corr = correlate(xn, yn, mode="full")
    lags = np.arange(-len(r_grid) + 1, len(r_grid))
    lag_kpc = lags * (r_grid[1] - r_grid[0])
    idx = np.argmax(corr)
    return lag_kpc[idx], corr, lag_kpc

def save_both(fig_num):
    png_name = f"{base_name}_fig{fig_num}.png"
    pdf_name = f"{base_name}_fig{fig_num}.pdf"
    plt.savefig(png_name, dpi=300, bbox_inches="tight")
    plt.savefig(pdf_name, bbox_inches="tight")
    saved_pngs.append(png_name)
    saved_pdfs.append(pdf_name)

def plot_power(freqs, power, title, fig_num):
    mask = freqs > 0
    wl = 1.0 / freqs[mask]
    pw = power[mask]
    idx = np.argsort(wl)
    wl = wl[idx]
    pw = pw[idx]

    plt.figure(figsize=(8,4))
    plt.plot(wl, pw)
    plt.xlim(0, min(40, np.nanmax(wl)))
    plt.xlabel("Wavelength [kpc]")
    plt.ylabel("Power")
    plt.title(title)
    plt.grid(True)
    plt.tight_layout()
    save_both(fig_num)
    plt.show()

saved_pngs = []
saved_pdfs = []

# ------------------------------------------------------------
# 3) Recompute baseline products exactly as in working cell
# ------------------------------------------------------------
prof_1 = build_gaia_profile(gaia, bin_width=1.0, min_count=10)
common_r, sof_i, gai_i = interpolate_common(sofue, prof_1, npts=500)

sof_s = gaussian_filter1d(sof_i, sigma=6)
gai_s = gaussian_filter1d(gai_i, sigma=6)

sof_n = normalize(sof_i)
gai_n = normalize(gai_i)
sof_sn = normalize(sof_s)
gai_sn = normalize(gai_s)

lam_sof_raw, f_sof_raw, p_sof_raw = dominant_wavelength(common_r, sof_i)
lam_gai_raw, f_gai_raw, p_gai_raw = dominant_wavelength(common_r, gai_i)
lam_sof_s, f_sof_s, p_sof_s = dominant_wavelength(common_r, sof_s)
lam_gai_s, f_gai_s, p_gai_s = dominant_wavelength(common_r, gai_s)

lag_raw, corr_raw, lag_axis_raw = best_lag_kpc(sof_i, gai_i, common_r)
lag_s, corr_s, lag_axis_s = best_lag_kpc(sof_s, gai_s, common_r)

# ------------------------------------------------------------
# FIGURE 1: 2x2 baseline plots
# ------------------------------------------------------------
fig, axes = plt.subplots(2, 2, figsize=(13, 9))

axes[0,0].plot(common_r, sof_n, label="Sofue raw (norm)")
axes[0,0].plot(common_r, gai_n, label="Gaia raw (norm)")
axes[0,0].axhline(0, ls="--", c="gray")
axes[0,0].set_title("Raw normalized signals")
axes[0,0].set_xlabel("Radius [kpc]")
axes[0,0].set_ylabel("Amplitude")
axes[0,0].legend()
axes[0,0].grid(True)

axes[0,1].plot(common_r, sof_sn, label="Sofue smoothed (norm)")
axes[0,1].plot(common_r, gai_sn, label="Gaia smoothed (norm)")
axes[0,1].axhline(0, ls="--", c="gray")
axes[0,1].set_title("Smoothed normalized signals")
axes[0,1].set_xlabel("Radius [kpc]")
axes[0,1].set_ylabel("Amplitude")
axes[0,1].legend()
axes[0,1].grid(True)

axes[1,0].plot(lag_axis_raw, corr_raw)
axes[1,0].axvline(lag_raw, color="red", ls="--", label=f"Best lag = {lag_raw:.2f} kpc")
axes[1,0].set_title("Cross-correlation (raw)")
axes[1,0].set_xlabel("Lag [kpc]")
axes[1,0].set_ylabel("Correlation")
axes[1,0].legend()
axes[1,0].grid(True)

axes[1,1].plot(lag_axis_s, corr_s)
axes[1,1].axvline(lag_s, color="red", ls="--", label=f"Best lag = {lag_s:.2f} kpc")
axes[1,1].set_title("Cross-correlation (smoothed)")
axes[1,1].set_xlabel("Lag [kpc]")
axes[1,1].set_ylabel("Correlation")
axes[1,1].legend()
axes[1,1].grid(True)

plt.tight_layout()
save_both(1)
plt.show()

# ------------------------------------------------------------
# FIGURES 2–5: Power spectra
# ------------------------------------------------------------
plot_power(f_sof_raw, p_sof_raw, "Sofue raw: power vs wavelength", 2)
plot_power(f_gai_raw, p_gai_raw, "Gaia raw: power vs wavelength", 3)
plot_power(f_sof_s, p_sof_s, "Sofue smoothed: power vs wavelength", 4)
plot_power(f_gai_s, p_gai_s, "Gaia smoothed: power vs wavelength", 5)

# ------------------------------------------------------------
# FIGURE 6: Lag robustness
# ------------------------------------------------------------
plt.figure(figsize=(8,4))
plt.plot(robust["gaia_bin_width_kpc"], robust["best_lag_kpc"], marker="o")
plt.axhline(np.mean(robust["best_lag_kpc"]), color="red", ls="--",
            label=f"Mean lag = {np.mean(robust['best_lag_kpc']):.2f} kpc")
plt.xlabel("Gaia bin width [kpc]")
plt.ylabel("Best lag [kpc]")
plt.title("Lag robustness vs Gaia radial bin width")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both(6)
plt.show()

# ------------------------------------------------------------
# 4) Build ZIP bundle
# ------------------------------------------------------------
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    if os.path.exists(py_name):
        zf.write(py_name)

    for fn in saved_pngs + saved_pdfs:
        if os.path.exists(fn):
            zf.write(fn)

    for fn in csv_files:
        if os.path.exists(fn):
            zf.write(fn)

# ------------------------------------------------------------
# 5) Print saved files
# ------------------------------------------------------------
print("\nSaved files:")
for fn in [py_name] + saved_pngs + saved_pdfs + csv_files + [zip_name]:
    if os.path.exists(fn):
        print(" -", fn)
    else:
        print(" - MISSING:", fn)

# ------------------------------------------------------------
# 6) Download
# ------------------------------------------------------------
files.download(zip_name)



⸻
✅✅✅ 


9.5 Stress Test

GAIA vs SOFUE STRESS TEST

# ============================================================
# GAIA vs SOFUE STRESS TEST CELL
# Uses real files:
#   1) gaia_poggio_clean_galactocentric.csv
#   2) sofue_residuals_simple.csv
#
# Tests:
#   - Lomb-Scargle on unevenly spaced signals
#   - FFT after detrending (none / linear / quadratic)
#   - cross-correlation lag
#   - phase-shuffle bootstrap significance
#
# Saves:
#   - figures (PNG + PDF)
#   - CSV summaries
#   - .py snapshot
#   - one ZIP bundle
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.interpolate import interp1d
from scipy.signal import correlate, lombscargle
from scipy.ndimage import gaussian_filter1d
from google.colab import files
from IPython import get_ipython

# ------------------------------------------------------------
# Load files (upload only if needed)
# ------------------------------------------------------------
try:
    gaia = pd.read_csv("gaia_poggio_clean_galactocentric.csv")
    sofue = pd.read_csv("sofue_residuals_simple.csv")
    print("Loaded local files from session.")
except:
    print("Upload these two files now:")
    print(" - gaia_poggio_clean_galactocentric.csv")
    print(" - sofue_residuals_simple.csv")
    uploaded = files.upload()

    gaia_file = None
    sofue_file = None
    for fn in uploaded.keys():
        low = fn.lower()
        if "gaia_poggio_clean_galactocentric" in low:
            gaia_file = fn
        if "sofue_residuals_simple" in low:
            sofue_file = fn

    if gaia_file is None or sofue_file is None:
        raise ValueError("Could not identify one or both uploaded files.")

    gaia = pd.read_csv(gaia_file)
    sofue = pd.read_csv(sofue_file)

print("\nGaia columns:", gaia.columns.tolist())
print("Sofue columns:", sofue.columns.tolist())

# ------------------------------------------------------------
# Helpers
# ------------------------------------------------------------
def build_gaia_profile(gaia_df, bin_width=1.0, min_count=10):
    df = gaia_df.copy()
    rmin = max(0.0, np.floor(df["R_gc_kpc"].min()))
    rmax = min(35.0, np.ceil(df["R_gc_kpc"].max()))
    bins = np.arange(rmin, rmax + bin_width, bin_width)
    df["R_bin"] = pd.cut(df["R_gc_kpc"], bins=bins)

    prof = df.groupby("R_bin", observed=True).agg(
        R_mean=("R_gc_kpc", "mean"),
        z_mean=("z_kpc", "mean"),
        z_std=("z_kpc", "std"),
        N=("z_kpc", "size")
    ).reset_index(drop=True)

    prof["z_sem"] = prof["z_std"] / np.sqrt(prof["N"])
    prof = prof[prof["N"] >= min_count].copy()
    return prof

def interpolate_common(sofue_df, prof_df, npts=600):
    r_lo = max(float(sofue_df["radius_kpc"].min()), float(prof_df["R_mean"].min()))
    r_hi = min(float(sofue_df["radius_kpc"].max()), float(prof_df["R_mean"].max()))
    common_r = np.linspace(r_lo, r_hi, npts)

    f_sof = interp1d(
        sofue_df["radius_kpc"], sofue_df["residual_kms"],
        kind="linear", fill_value="extrapolate"
    )
    f_gai = interp1d(
        prof_df["R_mean"], prof_df["z_mean"],
        kind="linear", fill_value="extrapolate"
    )

    sof = f_sof(common_r)
    gai = f_gai(common_r)
    return common_r, sof, gai

def normalize(x):
    x = np.asarray(x)
    s = np.std(x)
    if s == 0:
        return x * 0.0
    return (x - np.mean(x)) / s

def detrend_poly(x, y, degree=0):
    x = np.asarray(x)
    y = np.asarray(y)
    if degree == 0:
        trend = np.mean(y) * np.ones_like(y)
    else:
        coeffs = np.polyfit(x, y, deg=degree)
        trend = np.polyval(coeffs, x)
    return y - trend, trend

def fft_power_vs_wavelength(x, y):
    x = np.asarray(x)
    y = np.asarray(y)
    dx = x[1] - x[0]
    y0 = y - np.mean(y)

    fft = np.fft.rfft(y0)
    freqs = np.fft.rfftfreq(len(y0), d=dx)
    power = np.abs(fft) ** 2

    mask = freqs > 0
    wl = 1.0 / freqs[mask]
    pw = power[mask]

    idx = np.argsort(wl)
    return wl[idx], pw[idx]

def dominant_fft_wavelength(x, y, wl_min=2.0, wl_max=20.0):
    wl, pw = fft_power_vs_wavelength(x, y)
    mask = (wl >= wl_min) & (wl <= wl_max)
    if not np.any(mask):
        return np.nan, wl, pw
    wls = wl[mask]
    pws = pw[mask]
    return float(wls[np.argmax(pws)]), wl, pw

def lomb_scargle_period(x, y, pmin=2.0, pmax=20.0, nfreq=5000):
    x = np.asarray(x)
    y = np.asarray(y)
    y = y - np.mean(y)

    periods = np.linspace(pmin, pmax, nfreq)
    freqs = 2.0 * np.pi / periods  # angular frequency required
    power = lombscargle(x, y, freqs, normalize=True)

    best_period = periods[np.argmax(power)]
    return best_period, periods, power

def best_lag_kpc(x, y, r_grid):
    xn = normalize(x)
    yn = normalize(y)
    corr = correlate(xn, yn, mode="full")
    lags = np.arange(-len(r_grid) + 1, len(r_grid))
    lag_kpc = lags * (r_grid[1] - r_grid[0])
    idx = np.argmax(np.abs(corr))
    return float(lag_kpc[idx]), corr, lag_kpc, float(corr[idx])

def phase_shuffle_same_spectrum(signal, rng):
    """
    Preserve Fourier amplitudes, randomize phases.
    Input must be evenly sampled.
    """
    y = np.asarray(signal)
    y0 = y - np.mean(y)
    fft = np.fft.rfft(y0)

    amps = np.abs(fft)
    phases = np.angle(fft)

    new_phases = phases.copy()
    if len(new_phases) > 2:
        new_phases[1:-1] = rng.uniform(0, 2*np.pi, size=len(new_phases)-2)

    new_fft = amps * np.exp(1j * new_phases)
    y_shuff = np.fft.irfft(new_fft, n=len(y))
    return y_shuff

def bootstrap_phase_shuffle_pvalue(x_signal, y_signal, r_grid, n_boot=2000, seed=42):
    rng = np.random.default_rng(seed)

    obs_lag, obs_corr, obs_lag_axis, obs_peak = best_lag_kpc(x_signal, y_signal, r_grid)

    null_peaks = []
    null_lags = []

    for _ in range(n_boot):
        y_shuff = phase_shuffle_same_spectrum(y_signal, rng)
        lag_b, corr_b, lag_axis_b, peak_b = best_lag_kpc(x_signal, y_shuff, r_grid)
        null_peaks.append(abs(peak_b))
        null_lags.append(lag_b)

    null_peaks = np.array(null_peaks)
    null_lags = np.array(null_lags)

    p_value = float(np.mean(null_peaks >= abs(obs_peak)))
    return {
        "obs_lag_kpc": obs_lag,
        "obs_peak_corr": obs_peak,
        "null_peaks": null_peaks,
        "null_lags": null_lags,
        "p_value": p_value
    }

# ------------------------------------------------------------
# Build baseline Gaia profile
# ------------------------------------------------------------
prof_1 = build_gaia_profile(gaia, bin_width=1.0, min_count=10)

# Unevenly spaced signals for Lomb-Scargle
x_sof_uneven = sofue["radius_kpc"].values
y_sof_uneven = sofue["residual_kms"].values

x_gai_uneven = prof_1["R_mean"].values
y_gai_uneven = prof_1["z_mean"].values

# Interpolated common grid for FFT / xcorr / bootstrap
common_r, sof_i, gai_i = interpolate_common(sofue, prof_1, npts=600)

# Detrended versions
sof_none, sof_none_trend = detrend_poly(common_r, sof_i, degree=0)
gai_none, gai_none_trend = detrend_poly(common_r, gai_i, degree=0)

sof_lin, sof_lin_trend = detrend_poly(common_r, sof_i, degree=1)
gai_lin, gai_lin_trend = detrend_poly(common_r, gai_i, degree=1)

sof_quad, sof_quad_trend = detrend_poly(common_r, sof_i, degree=2)
gai_quad, gai_quad_trend = detrend_poly(common_r, gai_i, degree=2)

# Mild smoothing versions for large-scale comparison
sof_sm = gaussian_filter1d(sof_i, sigma=6)
gai_sm = gaussian_filter1d(gai_i, sigma=6)

# ------------------------------------------------------------
# Lomb-Scargle on unevenly spaced raw profiles
# ------------------------------------------------------------
ls_sof_period, ls_sof_periods, ls_sof_power = lomb_scargle_period(x_sof_uneven, y_sof_uneven, pmin=2.0, pmax=20.0)
ls_gai_period, ls_gai_periods, ls_gai_power = lomb_scargle_period(x_gai_uneven, y_gai_uneven, pmin=2.0, pmax=20.0)

# Get top few peaks
def top_periods(periods, power, n=5):
    idx = np.argsort(power)[::-1][:n]
    out = pd.DataFrame({
        "period_kpc": periods[idx],
        "power": power[idx]
    }).sort_values("period_kpc").reset_index(drop=True)
    return out

ls_sof_top = top_periods(ls_sof_periods, ls_sof_power, n=5)
ls_gai_top = top_periods(ls_gai_periods, ls_gai_power, n=5)

# ------------------------------------------------------------
# FFT dominant wavelengths after detrending
# ------------------------------------------------------------
fft_rows = []

for label, ys, yg in [
    ("none", sof_none, gai_none),
    ("linear", sof_lin, gai_lin),
    ("quadratic", sof_quad, gai_quad),
    ("smoothed", sof_sm, gai_sm),
]:
    lam_s, _, _ = dominant_fft_wavelength(common_r, ys, wl_min=2.0, wl_max=20.0)
    lam_g, _, _ = dominant_fft_wavelength(common_r, yg, wl_min=2.0, wl_max=20.0)

    lag_kpc, corr, lag_axis, peak_corr = best_lag_kpc(ys, yg, common_r)

    fft_rows.append({
        "mode": label,
        "sofue_dom_wavelength_kpc": lam_s,
        "gaia_dom_wavelength_kpc": lam_g,
        "best_lag_kpc": lag_kpc,
        "peak_corr": peak_corr
    })

fft_summary = pd.DataFrame(fft_rows)

# ------------------------------------------------------------
# Phase-shuffle bootstrap on cross-correlation peak
# ------------------------------------------------------------
boot_rows = []

for label, ys, yg in [
    ("raw_none", sof_none, gai_none),
    ("linear_detrend", sof_lin, gai_lin),
    ("quadratic_detrend", sof_quad, gai_quad),
    ("smoothed", sof_sm, gai_sm),
]:
    res = bootstrap_phase_shuffle_pvalue(ys, yg, common_r, n_boot=2000, seed=42)
    boot_rows.append({
        "mode": label,
        "obs_lag_kpc": res["obs_lag_kpc"],
        "obs_peak_corr": res["obs_peak_corr"],
        "p_value": res["p_value"]
    })

    if label == "raw_none":
        boot_raw = res
    elif label == "linear_detrend":
        boot_lin = res
    elif label == "quadratic_detrend":
        boot_quad = res
    elif label == "smoothed":
        boot_sm = res

bootstrap_summary = pd.DataFrame(boot_rows)

# ------------------------------------------------------------
# Print summaries
# ------------------------------------------------------------
print("\n=== Lomb-Scargle dominant periods ===")
print(f"Sofue best period: {ls_sof_period:.2f} kpc")
print(f"Gaia  best period: {ls_gai_period:.2f} kpc")

print("\nTop Sofue Lomb-Scargle periods:")
print(ls_sof_top.to_string(index=False))

print("\nTop Gaia Lomb-Scargle periods:")
print(ls_gai_top.to_string(index=False))

print("\n=== FFT / lag summary ===")
print(fft_summary.to_string(index=False))

print("\n=== Bootstrap significance summary ===")
print(bootstrap_summary.to_string(index=False))

# ------------------------------------------------------------
# Save CSV summaries
# ------------------------------------------------------------
ls_sof_top.to_csv("gaia_sofue_lomb_scargle_sofue_top.csv", index=False)
ls_gai_top.to_csv("gaia_sofue_lomb_scargle_gaia_top.csv", index=False)
fft_summary.to_csv("gaia_sofue_fft_detrend_summary.csv", index=False)
bootstrap_summary.to_csv("gaia_sofue_bootstrap_summary.csv", index=False)

# ------------------------------------------------------------
# Figures
# ------------------------------------------------------------
saved_pngs = []
saved_pdfs = []

def save_both(fig_name):
    png_name = f"{fig_name}.png"
    pdf_name = f"{fig_name}.pdf"
    plt.savefig(png_name, dpi=300, bbox_inches="tight")
    plt.savefig(pdf_name, bbox_inches="tight")
    saved_pngs.append(png_name)
    saved_pdfs.append(pdf_name)

# Fig 1: Lomb-Scargle
plt.figure(figsize=(10,5))
plt.plot(ls_sof_periods, ls_sof_power, label="Sofue")
plt.plot(ls_gai_periods, ls_gai_power, label="Gaia")
plt.axvline(ls_sof_period, color="tab:blue", ls="--", alpha=0.7)
plt.axvline(ls_gai_period, color="tab:orange", ls="--", alpha=0.7)
plt.xlabel("Period / wavelength [kpc]")
plt.ylabel("Lomb–Scargle power")
plt.title("Lomb–Scargle periodograms (unevenly spaced raw profiles)")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_stresstest_fig1_lombscargle")
plt.show()

# Fig 2: Detrended signal comparison
fig, axes = plt.subplots(3, 1, figsize=(11, 10), sharex=True)

axes[0].plot(common_r, normalize(sof_none), label="Sofue")
axes[0].plot(common_r, normalize(gai_none), label="Gaia")
axes[0].set_title("No detrending")
axes[0].grid(True)
axes[0].legend()

axes[1].plot(common_r, normalize(sof_lin), label="Sofue")
axes[1].plot(common_r, normalize(gai_lin), label="Gaia")
axes[1].set_title("Linear detrending")
axes[1].grid(True)
axes[1].legend()

axes[2].plot(common_r, normalize(sof_quad), label="Sofue")
axes[2].plot(common_r, normalize(gai_quad), label="Gaia")
axes[2].set_title("Quadratic detrending")
axes[2].set_xlabel("Radius [kpc]")
axes[2].grid(True)
axes[2].legend()

plt.tight_layout()
save_both("gaia_sofue_stresstest_fig2_detrended_signals")
plt.show()

# Fig 3: FFT power after detrending
plt.figure(figsize=(10,5))
for label, ys, yg, style in [
    ("Sofue linear", sof_lin, None, "-"),
    ("Gaia linear", None, gai_lin, "--"),
    ("Sofue quadratic", sof_quad, None, "-."),
    ("Gaia quadratic", None, gai_quad, ":"),
]:
    if ys is not None:
        wl, pw = fft_power_vs_wavelength(common_r, ys)
        mask = (wl >= 2.0) & (wl <= 20.0)
        plt.plot(wl[mask], pw[mask], style, label=label)
    if yg is not None:
        wl, pw = fft_power_vs_wavelength(common_r, yg)
        mask = (wl >= 2.0) & (wl <= 20.0)
        plt.plot(wl[mask], pw[mask], style, label=label)

plt.xlabel("Wavelength [kpc]")
plt.ylabel("FFT power")
plt.title("FFT power spectra after detrending")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_stresstest_fig3_fft_detrended")
plt.show()

# Fig 4: Cross-correlation comparisons
fig, axes = plt.subplots(2, 2, figsize=(12,8), sharex=False, sharey=False)

for ax, title, ys, yg in [
    (axes[0,0], "No detrending", sof_none, gai_none),
    (axes[0,1], "Linear detrending", sof_lin, gai_lin),
    (axes[1,0], "Quadratic detrending", sof_quad, gai_quad),
    (axes[1,1], "Smoothed", sof_sm, gai_sm),
]:
    lag_kpc, corr, lag_axis, peak_corr = best_lag_kpc(ys, yg, common_r)
    ax.plot(lag_axis, corr)
    ax.axvline(lag_kpc, color="red", ls="--", label=f"Lag = {lag_kpc:.2f} kpc")
    ax.set_title(title)
    ax.set_xlabel("Lag [kpc]")
    ax.set_ylabel("Correlation")
    ax.legend()
    ax.grid(True)

plt.tight_layout()
save_both("gaia_sofue_stresstest_fig4_crosscorrelation")
plt.show()

# Fig 5: Bootstrap null distributions
fig, axes = plt.subplots(2, 2, figsize=(12,8))

for ax, title, boot in [
    (axes[0,0], "Raw / none", boot_raw),
    (axes[0,1], "Linear detrend", boot_lin),
    (axes[1,0], "Quadratic detrend", boot_quad),
    (axes[1,1], "Smoothed", boot_sm),
]:
    ax.hist(boot["null_peaks"], bins=40, alpha=0.8)
    ax.axvline(abs(boot["obs_peak_corr"]), color="red", ls="--",
               label=f"Obs = {boot['obs_peak_corr']:.3f}\np = {boot['p_value']:.3f}")
    ax.set_title(title)
    ax.set_xlabel("Max |cross-correlation|")
    ax.set_ylabel("Count")
    ax.legend()
    ax.grid(True)

plt.tight_layout()
save_both("gaia_sofue_stresstest_fig5_bootstrap_null")
plt.show()

# Fig 6: Null lag distributions
fig, axes = plt.subplots(2, 2, figsize=(12,8))

for ax, title, boot in [
    (axes[0,0], "Raw / none", boot_raw),
    (axes[0,1], "Linear detrend", boot_lin),
    (axes[1,0], "Quadratic detrend", boot_quad),
    (axes[1,1], "Smoothed", boot_sm),
]:
    ax.hist(boot["null_lags"], bins=40, alpha=0.8)
    ax.axvline(boot["obs_lag_kpc"], color="red", ls="--",
               label=f"Obs lag = {boot['obs_lag_kpc']:.2f} kpc")
    ax.set_title(title)
    ax.set_xlabel("Best lag [kpc]")
    ax.set_ylabel("Count")
    ax.legend()
    ax.grid(True)

plt.tight_layout()
save_both("gaia_sofue_stresstest_fig6_null_lags")
plt.show()

# ------------------------------------------------------------
# Save .py snapshot of this current cell
# ------------------------------------------------------------
ip = get_ipython()
input_history = ip.user_ns.get("In", [])
current_cell_code = input_history[-1] if len(input_history) >= 1 else "# Current cell code not found."

py_name = "gaia_sofue_stresstest.py"
with open(py_name, "w", encoding="utf-8") as f:
    f.write(current_cell_code)

# ------------------------------------------------------------
# Build ZIP bundle
# ------------------------------------------------------------
zip_name = "gaia_sofue_stresstest_bundle.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fn in (
        saved_pngs
        + saved_pdfs
        + [
            py_name,
            "gaia_sofue_lomb_scargle_sofue_top.csv",
            "gaia_sofue_lomb_scargle_gaia_top.csv",
            "gaia_sofue_fft_detrend_summary.csv",
            "gaia_sofue_bootstrap_summary.csv",
        ]
    ):
        if os.path.exists(fn):
            zf.write(fn)

print("\nSaved:")
for fn in (
    saved_pngs
    + saved_pdfs
    + [
        py_name,
        "gaia_sofue_lomb_scargle_sofue_top.csv",
        "gaia_sofue_lomb_scargle_gaia_top.csv",
        "gaia_sofue_fft_detrend_summary.csv",
        "gaia_sofue_bootstrap_summary.csv",
        zip_name,
    ]
):
    if os.path.exists(fn):
        print(" -", fn)

files.download(zip_name)

✅✅✅ 
Cell output (see figures 69-74)

Loaded local files from session.

Gaia columns: ['source_id', 'ra', 'dec', 'l', 'b', 'parallax', 'parallax_error', 'pmra', 'pmdec', 'radial_velocity', 'radial_velocity_error', 'd_kpc', 'x_kpc', 'y_kpc', 'z_kpc', 'R_gc_kpc']
Sofue columns: ['radius_kpc', 'residual_kms']

=== Lomb-Scargle dominant periods ===
Sofue best period: 5.06 kpc
Gaia  best period: 4.18 kpc

Top Sofue Lomb-Scargle periods:
 period_kpc    power
   5.057011 0.202707
   5.060612 0.202748
   5.064213 0.202762
   5.067814 0.202751
   5.071414 0.202713

Top Gaia Lomb-Scargle periods:
 period_kpc    power
   4.171234 0.140664
   4.174835 0.140714
   4.178436 0.140732
   4.182036 0.140717
   4.185637 0.140671

=== FFT / lag summary ===
     mode  sofue_dom_wavelength_kpc  gaia_dom_wavelength_kpc  best_lag_kpc   peak_corr
     none                  12.65809                 6.329045     10.801570  285.246401
   linear                  12.65809                12.658090      2.405037 -361.760684
quadratic                  12.65809                12.658090      2.489424 -332.523293
 smoothed                  12.65809                 6.329045     10.717183  290.098479

=== Bootstrap significance summary ===
             mode  obs_lag_kpc  obs_peak_corr  p_value
         raw_none    10.801570     285.246401   0.7920
   linear_detrend     2.405037    -361.760684   0.0020
quadratic_detrend     2.489424    -332.523293   0.0645
         smoothed    10.717183     290.098479   0.7895

⸻

9.6 Shared-Mode Phase Test

# ============================================================
# REFINED SHARED-MODE PHASE TEST
# Joint fit with shared wavelength, separate phases/amplitudes
# Uses:
#   1) gaia_poggio_clean_galactocentric.csv
#   2) milky_way_sofue_v15_residuals.csv
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy.interpolate import interp1d
from scipy.optimize import curve_fit
from scipy.ndimage import gaussian_filter1d
from google.colab import files
from IPython import get_ipython

# ------------------------------------------------------------
# Load files
# ------------------------------------------------------------
try:
    gaia = pd.read_csv("gaia_poggio_clean_galactocentric.csv")
    sofue = pd.read_csv("milky_way_sofue_v15_residuals.csv")
    print("Loaded local files from session.")
except:
    print("Upload these two files now:")
    print(" - gaia_poggio_clean_galactocentric.csv")
    print(" - milky_way_sofue_v15_residuals.csv")
    uploaded = files.upload()

    gaia_file = None
    sofue_file = None
    for fn in uploaded.keys():
        low = fn.lower()
        if "gaia_poggio_clean_galactocentric" in low:
            gaia_file = fn
        if "milky_way_sofue_v15_residuals" in low:
            sofue_file = fn

    if gaia_file is None or sofue_file is None:
        raise ValueError("Could not identify one or both uploaded files.")

    gaia = pd.read_csv(gaia_file)
    sofue = pd.read_csv(sofue_file)

sofue = sofue[["radius_kpc", "residual_kms"]].copy()

# ------------------------------------------------------------
# Helpers
# ------------------------------------------------------------
def build_gaia_profile(gaia_df, bin_width=1.0, min_count=10):
    df = gaia_df.copy()
    rmin = max(0.0, np.floor(df["R_gc_kpc"].min()))
    rmax = min(35.0, np.ceil(df["R_gc_kpc"].max()))
    bins = np.arange(rmin, rmax + bin_width, bin_width)
    df["R_bin"] = pd.cut(df["R_gc_kpc"], bins=bins)

    prof = df.groupby("R_bin", observed=True).agg(
        R_mean=("R_gc_kpc", "mean"),
        z_mean=("z_kpc", "mean"),
        z_std=("z_kpc", "std"),
        N=("z_kpc", "size")
    ).reset_index(drop=True)

    prof["z_sem"] = prof["z_std"] / np.sqrt(prof["N"])
    prof = prof[prof["N"] >= min_count].copy()
    return prof

def interpolate_common(sofue_df, prof_df, npts=800):
    r_lo = max(float(sofue_df["radius_kpc"].min()), float(prof_df["R_mean"].min()))
    r_hi = min(float(sofue_df["radius_kpc"].max()), float(prof_df["R_mean"].max()))
    common_r = np.linspace(r_lo, r_hi, npts)

    f_sof = interp1d(
        sofue_df["radius_kpc"], sofue_df["residual_kms"],
        kind="linear", fill_value="extrapolate"
    )
    f_gai = interp1d(
        prof_df["R_mean"], prof_df["z_mean"],
        kind="linear", fill_value="extrapolate"
    )

    sof = f_sof(common_r)
    gai = f_gai(common_r)
    return common_r, sof, gai

def detrend_poly(x, y, degree=2):
    coeffs = np.polyfit(x, y, deg=degree)
    trend = np.polyval(coeffs, x)
    return y - trend, trend

def wrap_phase(ph):
    return (ph + np.pi) % (2*np.pi) - np.pi

def phase_to_radial_lag(delta_phi, lam):
    return (delta_phi / (2.0 * np.pi)) * lam

# joint shared-lambda model
def shared_lambda_model(x, A1, phi1, c1, A2, phi2, c2, lam):
    y1 = A1 * np.sin(2*np.pi*x/lam + phi1) + c1
    y2 = A2 * np.sin(2*np.pi*x/lam + phi2) + c2
    return y1, y2

def fit_joint_shared_lambda(x, y1, y2, lam_min=4.0, lam_max=6.0, n_lam=500):
    lam_grid = np.linspace(lam_min, lam_max, n_lam)
    best = None

    A10 = 0.5 * (np.nanmax(y1) - np.nanmin(y1))
    A20 = 0.5 * (np.nanmax(y2) - np.nanmin(y2))
    c10 = np.nanmean(y1)
    c20 = np.nanmean(y2)

    for lam in lam_grid:
        try:
            # fit Sofue
            popt1, _ = curve_fit(
                lambda xx, A, phi, c: A * np.sin(2*np.pi*xx/lam + phi) + c,
                x, y1, p0=[A10, 0.0, c10], maxfev=20000
            )
            # fit Gaia
            popt2, _ = curve_fit(
                lambda xx, A, phi, c: A * np.sin(2*np.pi*xx/lam + phi) + c,
                x, y2, p0=[A20, 0.0, c20], maxfev=20000
            )

            y1_fit = popt1[0] * np.sin(2*np.pi*x/lam + popt1[1]) + popt1[2]
            y2_fit = popt2[0] * np.sin(2*np.pi*x/lam + popt2[1]) + popt2[2]

            sse = np.sum((y1 - y1_fit)**2) + np.sum((y2 - y2_fit)**2)

            if (best is None) or (sse < best["sse"]):
                best = {
                    "lam": lam,
                    "A1": popt1[0],
                    "phi1": popt1[1],
                    "c1": popt1[2],
                    "A2": popt2[0],
                    "phi2": popt2[1],
                    "c2": popt2[2],
                    "y1_fit": y1_fit,
                    "y2_fit": y2_fit,
                    "sse": sse
                }
        except Exception:
            continue

    if best is None:
        raise RuntimeError("Joint shared-lambda fit failed.")
    return best, lam_grid

def bootstrap_joint_fit(x, y1, y2, lam_min=4.0, lam_max=6.0, n_boot=500, seed=42):
    rng = np.random.default_rng(seed)
    rows = []
    n = len(x)
    idx_all = np.arange(n)

    for _ in range(n_boot):
        idx = rng.choice(idx_all, size=n, replace=True)
        idx = np.sort(idx)
        xb = x[idx]
        y1b = y1[idx]
        y2b = y2[idx]

        try:
            fitb, _ = fit_joint_shared_lambda(xb, y1b, y2b, lam_min=lam_min, lam_max=lam_max, n_lam=180)
            dphi = wrap_phase(fitb["phi2"] - fitb["phi1"])
            dR = phase_to_radial_lag(dphi, fitb["lam"])
            rows.append({
                "lambda_kpc": fitb["lam"],
                "phi_sofue": fitb["phi1"],
                "phi_gaia": fitb["phi2"],
                "delta_phi": dphi,
                "delta_R_kpc": dR,
                "A_sofue": fitb["A1"],
                "A_gaia": fitb["A2"]
            })
        except Exception:
            continue

    return pd.DataFrame(rows)

# ------------------------------------------------------------
# Build signals
# ------------------------------------------------------------
prof_1 = build_gaia_profile(gaia, bin_width=1.0, min_count=10)
common_r, sof_i, gai_i = interpolate_common(sofue, prof_1, npts=800)

# detrend + mild smoothing
sof_dt, _ = detrend_poly(common_r, sof_i, degree=2)
gai_dt, _ = detrend_poly(common_r, gai_i, degree=2)

sof_use = gaussian_filter1d(sof_dt, sigma=2)
gai_use = gaussian_filter1d(gai_dt, sigma=2)

# restrict to cleaner window
mask = (common_r >= 6.0) & (common_r <= 26.0)
x_fit = common_r[mask]
y_sof = sof_use[mask]
y_gai = gai_use[mask]

# ------------------------------------------------------------
# Fit shared wavelength
# ------------------------------------------------------------
fit, lam_grid = fit_joint_shared_lambda(x_fit, y_sof, y_gai, lam_min=4.0, lam_max=6.0, n_lam=500)

delta_phi = wrap_phase(fit["phi2"] - fit["phi1"])
delta_R = phase_to_radial_lag(delta_phi, fit["lam"])

print("\n=== Refined joint shared-mode fit ===")
print(f"Shared lambda: {fit['lam']:.3f} kpc")
print(f"Sofue phase:   {fit['phi1']:.3f} rad")
print(f"Gaia phase:    {fit['phi2']:.3f} rad")
print(f"Delta phase:   {delta_phi:.3f} rad")
print(f"Radial lag:    {delta_R:.3f} kpc")
print(f"Sofue amplitude: {fit['A1']:.3f}")
print(f"Gaia amplitude:  {fit['A2']:.3f}")

# ------------------------------------------------------------
# Bootstrap
# ------------------------------------------------------------
boot = bootstrap_joint_fit(x_fit, y_sof, y_gai, lam_min=4.0, lam_max=6.0, n_boot=500, seed=42)

if len(boot) == 0:
    raise RuntimeError("Bootstrap returned no successful fits.")

lag_mean = boot["delta_R_kpc"].mean()
lag_std = boot["delta_R_kpc"].std()
lag_med = boot["delta_R_kpc"].median()
lag_lo = boot["delta_R_kpc"].quantile(0.16)
lag_hi = boot["delta_R_kpc"].quantile(0.84)

phi_mean = boot["delta_phi"].mean()
phi_std = boot["delta_phi"].std()

lam_mean = boot["lambda_kpc"].mean()
lam_std = boot["lambda_kpc"].std()

print("\n=== Bootstrap summary ===")
print(f"N successful bootstraps: {len(boot)}")
print(f"Lambda mean ± std: {lam_mean:.3f} ± {lam_std:.3f} kpc")
print(f"Lag mean ± std:    {lag_mean:.3f} ± {lag_std:.3f} kpc")
print(f"Lag median [16,84]%: {lag_med:.3f} [{lag_lo:.3f}, {lag_hi:.3f}] kpc")
print(f"Phase mean ± std:  {phi_mean:.3f} ± {phi_std:.3f} rad")

# ------------------------------------------------------------
# Save tables
# ------------------------------------------------------------
fit_summary = pd.DataFrame([{
    "shared_lambda_kpc": fit["lam"],
    "sofue_phase_rad": fit["phi1"],
    "gaia_phase_rad": fit["phi2"],
    "delta_phase_rad": delta_phi,
    "radial_lag_kpc": delta_R,
    "sofue_amplitude": fit["A1"],
    "gaia_amplitude": fit["A2"],
    "fit_window_rmin_kpc": 6.0,
    "fit_window_rmax_kpc": 26.0
}])

boot_summary = pd.DataFrame([{
    "n_boot": len(boot),
    "lambda_mean_kpc": lam_mean,
    "lambda_std_kpc": lam_std,
    "lag_mean_kpc": lag_mean,
    "lag_std_kpc": lag_std,
    "lag_median_kpc": lag_med,
    "lag_p16_kpc": lag_lo,
    "lag_p84_kpc": lag_hi,
    "phase_mean_rad": phi_mean,
    "phase_std_rad": phi_std
}])

fit_summary.to_csv("gaia_sofue_refined_shortmode_fit_summary.csv", index=False)
boot.to_csv("gaia_sofue_refined_shortmode_bootstrap_samples.csv", index=False)
boot_summary.to_csv("gaia_sofue_refined_shortmode_bootstrap_summary.csv", index=False)

# ------------------------------------------------------------
# Figures
# ------------------------------------------------------------
saved_pngs = []
saved_pdfs = []

def save_both(fig_name):
    png_name = f"{fig_name}.png"
    pdf_name = f"{fig_name}.pdf"
    plt.savefig(png_name, dpi=300, bbox_inches="tight")
    plt.savefig(pdf_name, bbox_inches="tight")
    saved_pngs.append(png_name)
    saved_pdfs.append(pdf_name)

# Figure 1
plt.figure(figsize=(11,5))
plt.plot(x_fit, y_sof, label="Sofue detrended")
plt.plot(x_fit, fit["y1_fit"], ls="--", label=f"Sofue fit")
plt.plot(x_fit, y_gai, label="Gaia detrended")
plt.plot(x_fit, fit["y2_fit"], ls="--", label=f"Gaia fit")
plt.axhline(0, ls="--", c="gray")
plt.xlabel("Radius [kpc]")
plt.ylabel("Amplitude")
plt.title(f"Refined shared-mode joint fit (λ={fit['lam']:.2f} kpc)")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_refined_fig1_joint_fit")
plt.show()

# Figure 2
# SSE scan
lam_scan = np.linspace(4.0, 6.0, 300)
sse_scan = []
for lam in lam_scan:
    try:
        popt1, _ = curve_fit(
            lambda xx, A, phi, c: A * np.sin(2*np.pi*xx/lam + phi) + c,
            x_fit, y_sof, p0=[0.5*(np.max(y_sof)-np.min(y_sof)), 0, np.mean(y_sof)], maxfev=20000
        )
        popt2, _ = curve_fit(
            lambda xx, A, phi, c: A * np.sin(2*np.pi*xx/lam + phi) + c,
            x_fit, y_gai, p0=[0.5*(np.max(y_gai)-np.min(y_gai)), 0, np.mean(y_gai)], maxfev=20000
        )
        y1f = popt1[0] * np.sin(2*np.pi*x_fit/lam + popt1[1]) + popt1[2]
        y2f = popt2[0] * np.sin(2*np.pi*x_fit/lam + popt2[1]) + popt2[2]
        sse = np.sum((y_sof - y1f)**2) + np.sum((y_gai - y2f)**2)
    except Exception:
        sse = np.nan
    sse_scan.append(sse)

plt.figure(figsize=(9,4))
plt.plot(lam_scan, sse_scan)
plt.axvline(fit["lam"], color="red", ls="--", label=f"Best λ = {fit['lam']:.2f}")
plt.xlabel("Shared wavelength [kpc]")
plt.ylabel("Joint SSE")
plt.title("Joint shared-wavelength scan")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_refined_fig2_sse_scan")
plt.show()

# Figure 3
plt.figure(figsize=(8,4))
plt.hist(boot["delta_R_kpc"], bins=40)
plt.axvline(delta_R, color="red", ls="--", label=f"Direct fit lag = {delta_R:.2f} kpc")
plt.axvline(lag_mean, color="black", ls="--", label=f"Bootstrap mean = {lag_mean:.2f} kpc")
plt.xlabel("Radial lag [kpc]")
plt.ylabel("Count")
plt.title("Refined bootstrap distribution of radial lag")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_refined_fig3_bootstrap_lag")
plt.show()

# Figure 4
plt.figure(figsize=(8,4))
plt.hist(boot["delta_phi"], bins=40)
plt.axvline(delta_phi, color="red", ls="--", label=f"Direct Δφ = {delta_phi:.2f} rad")
plt.axvline(phi_mean, color="black", ls="--", label=f"Bootstrap mean = {phi_mean:.2f} rad")
plt.xlabel("Phase difference [rad]")
plt.ylabel("Count")
plt.title("Refined bootstrap distribution of phase difference")
plt.legend()
plt.grid(True)
plt.tight_layout()
save_both("gaia_sofue_refined_fig4_bootstrap_phase")
plt.show()

# ------------------------------------------------------------
# Save .py snapshot
# ------------------------------------------------------------
ip = get_ipython()
input_history = ip.user_ns.get("In", [])
current_cell_code = input_history[-1] if len(input_history) >= 1 else "# Current cell code not found."

py_name = "gaia_sofue_refined_shortmode_phase_test.py"
with open(py_name, "w", encoding="utf-8") as f:
    f.write(current_cell_code)

# ------------------------------------------------------------
# Zip bundle
# ------------------------------------------------------------
zip_name = "gaia_sofue_refined_shortmode_phase_test_bundle.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fn in (
        saved_pngs
        + saved_pdfs
        + [
            py_name,
            "gaia_sofue_refined_shortmode_fit_summary.csv",
            "gaia_sofue_refined_shortmode_bootstrap_samples.csv",
            "gaia_sofue_refined_shortmode_bootstrap_summary.csv",
        ]
    ):
        if os.path.exists(fn):
            zf.write(fn)

print("\nSaved:")
for fn in (
    saved_pngs
    + saved_pdfs
    + [
        py_name,
        "gaia_sofue_refined_shortmode_fit_summary.csv",
        "gaia_sofue_refined_shortmode_bootstrap_samples.csv",
        "gaia_sofue_refined_shortmode_bootstrap_summary.csv",
        zip_name
    ]
):
    if os.path.exists(fn):
        print(" -", fn)

files.download(zip_name)

✅✅✅ 
Output (see figures 75-78)

Loaded local files from session.

=== Refined joint shared-mode fit ===
Shared lambda: 5.407 kpc
Sofue phase:   -0.768 rad
Gaia phase:    0.580 rad
Delta phase:   1.349 rad
Radial lag:    1.161 kpc
Sofue amplitude: 0.534
Gaia amplitude:  -0.055

=== Bootstrap summary ===
N successful bootstraps: 500
Lambda mean ± std: 5.402 ± 0.086 kpc
Lag mean ± std:    1.153 ± 0.225 kpc
Lag median [16,84]%: 1.148 [0.921, 1.381] kpc
Phase mean ± std:  1.338 ± 0.251 rad

⸻

✅✅✅ 


9.7 Milky Way Test

(Newton vs RAR vs Newton + Wave)
MILKY WAY TEST:
Newton vs RAR vs Newton+Wave

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import zipfile, os
from google.colab import files
from IPython import get_ipython

# ============================================================
# CLEANER MILKY WAY TEST:
# Newton vs RAR vs Newton+Wave
# Uses Sofue velocity curve directly
# ============================================================

# ------------------------------------------------------------
# 1) Sofue Milky Way rotation-curve data
# ------------------------------------------------------------
r = np.array([
    0.20, 0.50, 1.00, 1.50, 2.00, 2.50, 3.00, 3.50, 4.00,
    4.50, 5.00, 5.50, 6.00, 6.50, 7.00, 7.50, 8.00, 8.50,
    9.00, 9.50, 10.00, 11.00, 12.00, 13.00, 14.00, 15.00,
    16.00, 17.00, 18.00, 19.00, 20.00, 22.00, 24.00, 26.00,
    28.00, 30.00
])

v_obs = np.array([
    60.5, 90.2, 112.3, 140.1, 162.5, 178.3, 188.7, 197.2, 204.5,
    212.1, 219.0, 224.6, 229.3, 232.5, 234.7, 236.1, 237.0, 236.5,
    235.4, 234.0, 232.1, 228.5, 224.3, 219.2, 214.1, 209.2,
    204.0, 199.5, 195.0, 190.2, 185.6, 180.3, 175.4, 170.5,
    166.2, 162.1
])

# ------------------------------------------------------------
# 2) Build an explicit simple baryonic/Newtonian baseline
#
# Idea:
# - inner galaxy is baryon-dominated
# - fit a smooth saturating baryonic curve to inner radii only
# - then use that as v_bar(r)
#
# This is still simplified, but much cleaner than the previous proxy.
# ------------------------------------------------------------
def vbar_model(r, Vb, rb):
    return Vb * (1.0 - np.exp(-r / rb))

inner_mask = r <= 5.0
popt_bar, _ = curve_fit(
    vbar_model,
    r[inner_mask],
    v_obs[inner_mask],
    p0=[230.0, 2.0],
    bounds=([0.0, 0.1], [400.0, 20.0]),
    maxfev=20000
)

Vb_fit, rb_fit = popt_bar
v_bar = vbar_model(r, Vb_fit, rb_fit)

# accelerations in consistent arbitrary units
g_obs = (v_obs**2) / r
g_bar = (v_bar**2) / r

# ------------------------------------------------------------
# 3) Standard RAR prediction
#
# Use McGaugh-like interpolation in dimensionless form.
# Since our units are arbitrary here, fit g_dagger in same units.
# ------------------------------------------------------------
def rar_relation(gbar, gdag):
    x = np.clip(gbar / gdag, 1e-12, None)
    return gbar / (1.0 - np.exp(-np.sqrt(x)))

# fit gdag directly to the Milky Way points
popt_rar, _ = curve_fit(
    rar_relation,
    g_bar,
    g_obs,
    p0=[np.median(g_bar)],
    bounds=([1e-6], [1e8]),
    maxfev=20000
)
gdag_fit = popt_rar[0]
g_rar = rar_relation(g_bar, gdag_fit)

# convert back to velocity for plotting against Sofue curve
v_rar = np.sqrt(np.clip(g_rar * r, 0, None))

# ------------------------------------------------------------
# 4) Newtonian baseline in velocity space
# ------------------------------------------------------------
v_newton = v_bar.copy()

# ------------------------------------------------------------
# 5) Newton + fixed 5.4 kpc wave
#
# Wave enters acceleration, then convert back to velocity
# ------------------------------------------------------------
lambda_fixed = 5.4

def g_wave_fixed(r, A, phi):
    return g_bar + A * np.sin(2.0 * np.pi * r / lambda_fixed + phi)

popt_fixed, _ = curve_fit(
    g_wave_fixed,
    r,
    g_obs,
    p0=[0.1 * np.max(g_obs), 0.0],
    maxfev=20000
)
A_fixed, phi_fixed = popt_fixed
g_fixed = g_wave_fixed(r, A_fixed, phi_fixed)
v_fixed = np.sqrt(np.clip(g_fixed * r, 0, None))

# ------------------------------------------------------------
# 6) Newton + free-wave
# ------------------------------------------------------------
def g_wave_free(r, A, lam, phi):
    return g_bar + A * np.sin(2.0 * np.pi * r / lam + phi)

popt_free, _ = curve_fit(
    g_wave_free,
    r,
    g_obs,
    p0=[0.1 * np.max(g_obs), 5.4, 0.0],
    bounds=([0.0, 2.0, -10.0], [1e6, 20.0, 10.0]),
    maxfev=40000
)
A_free, lam_free, phi_free = popt_free
g_free = g_wave_free(r, A_free, lam_free, phi_free)
v_free = np.sqrt(np.clip(g_free * r, 0, None))

# ------------------------------------------------------------
# 7) Fit-quality metrics
# ------------------------------------------------------------
def rms(x):
    return np.sqrt(np.mean(x**2))

res_v_newton = v_obs - v_newton
res_v_rar    = v_obs - v_rar
res_v_fixed  = v_obs - v_fixed
res_v_free   = v_obs - v_free

rms_v_newton = rms(res_v_newton)
rms_v_rar    = rms(res_v_rar)
rms_v_fixed  = rms(res_v_fixed)
rms_v_free   = rms(res_v_free)

# acceleration residuals too
res_g_newton = g_obs - g_bar
res_g_rar    = g_obs - g_rar
res_g_fixed  = g_obs - g_fixed
res_g_free   = g_obs - g_free

rms_g_newton = rms(res_g_newton)
rms_g_rar    = rms(res_g_rar)
rms_g_fixed  = rms(res_g_fixed)
rms_g_free   = rms(res_g_free)

print("\n=== CLEAN MODEL COMPARISON ===")
print("Velocity-space RMS:")
print(f"  Newtonian        : {rms_v_newton:.3f}")
print(f"  RAR              : {rms_v_rar:.3f}")
print(f"  Newton + 5.4 wave: {rms_v_fixed:.3f}")
print(f"  Newton + freewave: {rms_v_free:.3f}")

print("\nAcceleration-space RMS:")
print(f"  Newtonian        : {rms_g_newton:.3f}")
print(f"  RAR              : {rms_g_rar:.3f}")
print(f"  Newton + 5.4 wave: {rms_g_fixed:.3f}")
print(f"  Newton + freewave: {rms_g_free:.3f}")

print("\n=== FIT PARAMETERS ===")
print(f"Baryonic baseline: Vb = {Vb_fit:.3f}, rb = {rb_fit:.3f}")
print(f"RAR gdag fit     : {gdag_fit:.6g}")
print(f"Fixed-wave fit   : A = {A_fixed:.3f}, lambda = {lambda_fixed:.3f} kpc, phi = {phi_fixed:.3f}")
print(f"Free-wave fit    : A = {A_free:.3f}, lambda = {lam_free:.3f} kpc, phi = {phi_free:.3f}")

# ------------------------------------------------------------
# 8) Plot velocity comparison
# ------------------------------------------------------------
fig1, ax = plt.subplots(figsize=(10, 6))
ax.plot(r, v_obs, 'ko', label="Observed (Sofue)")
ax.plot(r, v_newton, '--', label="Newtonian baryonic baseline")
ax.plot(r, v_rar, label="RAR")
ax.plot(r, v_fixed, label="Newton + fixed 5.4 kpc wave")
ax.plot(r, v_free, label="Newton + free-wave")
ax.set_xlabel("Radius [kpc]")
ax.set_ylabel("Velocity [km/s]")
ax.set_title("Milky Way velocity comparison: Newton vs RAR vs wave")
ax.legend()
ax.grid(True)
plt.tight_layout()

# ------------------------------------------------------------
# 9) Plot residual comparison
# ------------------------------------------------------------
fig2, axes = plt.subplots(2, 1, figsize=(10, 8), sharex=True)

axes[0].plot(r, res_v_newton, 'o-', label="Newtonian")
axes[0].plot(r, res_v_rar, 'o-', label="RAR")
axes[0].plot(r, res_v_fixed, 'o-', label="Newton + 5.4 wave")
axes[0].plot(r, res_v_free, 'o-', label="Newton + free-wave")
axes[0].axhline(0, ls='--', c='gray')
axes[0].set_ylabel("Velocity residual [km/s]")
axes[0].set_title("Velocity residuals")
axes[0].legend()
axes[0].grid(True)

axes[1].plot(r, res_g_newton, 'o-', label="Newtonian")
axes[1].plot(r, res_g_rar, 'o-', label="RAR")
axes[1].plot(r, res_g_fixed, 'o-', label="Newton + 5.4 wave")
axes[1].plot(r, res_g_free, 'o-', label="Newton + free-wave")
axes[1].axhline(0, ls='--', c='gray')
axes[1].set_xlabel("Radius [kpc]")
axes[1].set_ylabel("Acceleration residual [arb units]")
axes[1].set_title("Acceleration residuals")
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()

# ------------------------------------------------------------
# 10) Save outputs
# ------------------------------------------------------------
fig1_png = "mw_newton_rar_wave_velocity.png"
fig1_pdf = "mw_newton_rar_wave_velocity.pdf"
fig2_png = "mw_newton_rar_wave_residuals.png"
fig2_pdf = "mw_newton_rar_wave_residuals.pdf"

fig1.savefig(fig1_png, dpi=300, bbox_inches="tight")
fig1.savefig(fig1_pdf, bbox_inches="tight")
fig2.savefig(fig2_png, dpi=300, bbox_inches="tight")
fig2.savefig(fig2_pdf, bbox_inches="tight")

plt.show()

# summary csv
df = pd.DataFrame({
    "r_kpc": r,
    "v_obs": v_obs,
    "v_newton": v_newton,
    "v_rar": v_rar,
    "v_wave_fixed": v_fixed,
    "v_wave_free": v_free,
    "res_v_newton": res_v_newton,
    "res_v_rar": res_v_rar,
    "res_v_wave_fixed": res_v_fixed,
    "res_v_wave_free": res_v_free,
    "g_obs": g_obs,
    "g_bar": g_bar,
    "g_rar": g_rar,
    "g_wave_fixed": g_fixed,
    "g_wave_free": g_free,
    "res_g_newton": res_g_newton,
    "res_g_rar": res_g_rar,
    "res_g_wave_fixed": res_g_fixed,
    "res_g_wave_free": res_g_free,
})
csv_name = "mw_newton_rar_wave_comparison.csv"
df.to_csv(csv_name, index=False)

metrics = pd.DataFrame([{
    "rms_v_newton": rms_v_newton,
    "rms_v_rar": rms_v_rar,
    "rms_v_wave_fixed": rms_v_fixed,
    "rms_v_wave_free": rms_v_free,
    "rms_g_newton": rms_g_newton,
    "rms_g_rar": rms_g_rar,
    "rms_g_wave_fixed": rms_g_fixed,
    "rms_g_wave_free": rms_g_free,
    "Vb_fit": Vb_fit,
    "rb_fit": rb_fit,
    "gdag_fit": gdag_fit,
    "A_fixed": A_fixed,
    "phi_fixed": phi_fixed,
    "A_free": A_free,
    "lam_free": lam_free,
    "phi_free": phi_free
}])
metrics_csv = "mw_newton_rar_wave_metrics.csv"
metrics.to_csv(metrics_csv, index=False)

# save code snapshot from current cell
ip = get_ipython()
input_history = ip.user_ns.get("In", [])
current_cell_code = input_history[-1] if len(input_history) >= 1 else "# Current cell code not found."
py_name = "mw_newton_rar_wave_test.py"
with open(py_name, "w", encoding="utf-8") as f:
    f.write(current_cell_code)

# zip everything
zip_name = "mw_newton_rar_wave_test_bundle.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fn in [
        fig1_png, fig1_pdf, fig2_png, fig2_pdf,
        csv_name, metrics_csv, py_name
    ]:
        if os.path.exists(fn):
            zf.write(fn)

print("\nSaved:")
for fn in [fig1_png, fig1_pdf, fig2_png, fig2_pdf, csv_name, metrics_csv, py_name, zip_name]:
    if os.path.exists(fn):
        print(" -", fn)

files.download(zip_name)

Output (see figure 79)



=== CLEAN MODEL COMPARISON ===
Velocity-space RMS:
  Newtonian        : 21.945
  RAR              : 39.210
  Newton + 5.4 wave: 33.382
  Newton + freewave: 34.599

Acceleration-space RMS:
  Newtonian        : 2605.802
  RAR              : 2501.982
  Newton + 5.4 wave: 2469.914
  Newton + freewave: 2473.032

=== FIT PARAMETERS ===
Baryonic baseline: Vb = 212.210, rb = 1.237
RAR gdag fit     : 1332.86
Fixed-wave fit   : A = 1158.402, lambda = 5.400 kpc, phi = 1.438
Free-wave fit    : A = 1158.619, lambda = 5.709 kpc, phi = 1.428

⸻

9.8 SPARC Galaxy Test — Smooth Gaia ABS Support vs RAR

SPARC GALAXY TEST — SMOOTH GAIA ABS SUPPORT vs RAR

# ============================================================
# SPARC GALAXY TEST — SMOOTH GAIA ABS SUPPORT vs RAR
# CLEAN FULL CELL
#
# What it does:
#   1) Forces an upload prompt
#   2) Reads one SPARC rotmod file
#   3) Builds:
#        - Baryonic only
#        - Canonical RAR
#        - Smooth Gaia ABS support
#   4) Compares them in velocity and acceleration
#   5) Saves PNG + PDF + CSV + PY + ZIP
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d
from IPython import get_ipython
from google.colab import files

# ------------------------------------------------------------
# 0) FORCE upload prompt
# ------------------------------------------------------------
print("Upload one SPARC rotmod file now (for example: NGC3109_rotmod.dat)")
uploaded = files.upload()

if len(uploaded) == 0:
    raise ValueError("No file uploaded.")

rotmod_path = list(uploaded.keys())[0]
print(f"Using file: {rotmod_path}")

gal_name = os.path.basename(rotmod_path)
gal_name = gal_name.replace("_rotmod.dat", "")
gal_name = gal_name.replace(".dat", "")

# ------------------------------------------------------------
# 1) Read SPARC rotmod robustly
# Expected numeric columns usually:
#   R  Vobs  eV  Vgas  Vdisk  Vbul  SBdisk  SBbul
# ------------------------------------------------------------
rows = []

with open(rotmod_path, "r", encoding="utf-8", errors="ignore") as f:
    for line in f:
        s = line.strip()
        if (not s) or s.startswith("#"):
            continue

        parts = s.replace(",", " ").split()
        nums = []

        for p in parts:
            try:
                nums.append(float(p))
            except:
                pass

        if len(nums) >= 6:
            if len(nums) < 8:
                nums = nums + [np.nan] * (8 - len(nums))
            rows.append(nums[:8])

if len(rows) < 5:
    raise ValueError("Could not parse enough numeric rows from uploaded file.")

df = pd.DataFrame(
    rows,
    columns=[
        "R_kpc",
        "Vobs_kms",
        "eV_kms",
        "Vgas_kms",
        "Vdisk_kms",
        "Vbul_kms",
        "SBdisk",
        "SBbul",
    ],
)

df = df[np.isfinite(df["R_kpc"]) & np.isfinite(df["Vobs_kms"]) & (df["R_kpc"] > 0)].copy()
df = df.sort_values("R_kpc").reset_index(drop=True)

df["eV_kms"] = df["eV_kms"].replace(0, np.nan)
if df["eV_kms"].dropna().empty:
    df["eV_kms"] = 5.0
else:
    df["eV_kms"] = df["eV_kms"].fillna(df["eV_kms"].median())

for col in ["Vgas_kms", "Vdisk_kms", "Vbul_kms"]:
    df[col] = df[col].fillna(0.0)

r = df["R_kpc"].to_numpy()
v_obs = df["Vobs_kms"].to_numpy()
v_err = df["eV_kms"].to_numpy()
v_gas = df["Vgas_kms"].to_numpy()
v_disk = df["Vdisk_kms"].to_numpy()
v_bul = df["Vbul_kms"].to_numpy()

Rmin = float(np.min(r))
Rmax = float(np.max(r))

# ------------------------------------------------------------
# 2) Baryonic-only curve from SPARC components
# Fixed M/L values
# ------------------------------------------------------------
yd_fixed = 0.5
yb_fixed = 0.7

v_disk_eff = np.sqrt(np.clip(yd_fixed * v_disk**2, 0.0, None))
v_bul_eff = np.sqrt(np.clip(yb_fixed * v_bul**2, 0.0, None))
v_bar2 = np.abs(v_gas) * v_gas + yd_fixed * (v_disk**2) + yb_fixed * (v_bul**2)
v_bar = np.sqrt(np.clip(v_bar2, 0.0, None))

# ------------------------------------------------------------
# 3) Accelerations
# Units: (km/s)^2 / kpc
# ------------------------------------------------------------
g_obs = v_obs**2 / r
g_bar = v_bar**2 / r
delta_g_required = g_obs - g_bar

# ------------------------------------------------------------
# 4) Canonical fixed RAR
# gdag = 1.20e-10 m/s^2 = ~3702.84 (km/s)^2/kpc
# ------------------------------------------------------------
gdag_fixed = 3.0857e19 * 1.20e-10 / 1e6

def rar_relation(gbar, gdag):
    x = np.clip(gbar / gdag, 1e-12, None)
    return gbar / (1.0 - np.exp(-np.sqrt(x)))

g_rar = rar_relation(g_bar, gdag_fixed)
v_rar = np.sqrt(np.clip(g_rar * r, 0.0, None))
delta_g_rar = g_rar - g_bar

# ------------------------------------------------------------
# 5) Smooth Gaia ABS support envelope
# Gaia scale locked from MW:
#   lambda_fraction = 5.4 / 30
# ------------------------------------------------------------
lambda_fraction = 5.4 / 30.0
lambda_eff = lambda_fraction * Rmax

# Fine grid so the template is smooth
n_fine = 3000
r_fine = np.linspace(Rmin, Rmax, n_fine)
theta_fine = 2.0 * np.pi * r_fine / lambda_eff

raw_abs_fine = np.abs(np.sin(theta_fine))

dr_fine = r_fine[1] - r_fine[0]
smooth_scale_kpc = max(lambda_eff / 4.0, 2.0 * dr_fine)
sigma_samples = max(smooth_scale_kpc / dr_fine, 1.0)

smooth_abs_fine = gaussian_filter1d(raw_abs_fine, sigma=sigma_samples, mode="nearest")

peak_abs = np.max(smooth_abs_fine)
if peak_abs > 0:
    smooth_abs_fine = smooth_abs_fine / peak_abs
else:
    smooth_abs_fine = np.ones_like(smooth_abs_fine)

support_abs_template = np.interp(r, r_fine, smooth_abs_fine)

# ------------------------------------------------------------
# 6) Fit one amplitude directly:
#    delta_g_required ≈ A_abs * support_abs_template
# ------------------------------------------------------------
denom = np.sum(support_abs_template**2)
if denom <= 0:
    A_abs = 0.0
else:
    A_abs = max(0.0, np.sum(support_abs_template * delta_g_required) / denom)

delta_g_abs = A_abs * support_abs_template
g_abs_total = g_bar + delta_g_abs
v_abs_total = np.sqrt(np.clip(g_abs_total * r, 0.0, None))

delta_g_abs_fine = A_abs * smooth_abs_fine

# ------------------------------------------------------------
# 7) Metrics
# ------------------------------------------------------------
def rms(x):
    return float(np.sqrt(np.mean(x**2)))

res_v_bar = v_obs - v_bar
res_v_rar = v_obs - v_rar
res_v_abs = v_obs - v_abs_total

res_g_bar = g_obs - g_bar
res_g_rar = g_obs - g_rar
res_g_abs = g_obs - g_abs_total

print(f"\n=== {gal_name} — SMOOTH GAIA ABS SUPPORT TEST ===")
print(f"Disk M/L fixed      : {yd_fixed:.4f}")
print(f"Bulge M/L fixed     : {yb_fixed:.4f}")
print(f"RAR gdag fixed      : {gdag_fixed:.4f} (km/s)^2/kpc")
print(f"Galaxy Rmax         : {Rmax:.4f} kpc")
print(f"Gaia lambda fraction: {lambda_fraction:.6f}")
print(f"Gaia-locked lambda  : {lambda_eff:.4f} kpc")
print(f"Gaussian smoothing  : {smooth_scale_kpc:.4f} kpc")
print(f"A_abs               : {A_abs:.4f}")

print("\nVelocity RMS:")
print(f"  Baryonic only     : {rms(res_v_bar):.3f}")
print(f"  RAR total         : {rms(res_v_rar):.3f}")
print(f"  Gaia ABS support  : {rms(res_v_abs):.3f}")

print("\nAcceleration RMS:")
print(f"  Baryonic only     : {rms(res_g_bar):.3f}")
print(f"  RAR total         : {rms(res_g_rar):.3f}")
print(f"  Gaia ABS support  : {rms(res_g_abs):.3f}")

# ------------------------------------------------------------
# 8) Plots
# ------------------------------------------------------------
plt.rcParams.update({
    "figure.dpi": 140,
    "savefig.dpi": 300,
    "axes.grid": True,
})

# Figure 1: total curves
fig1, ax1 = plt.subplots(figsize=(10, 6))
ax1.errorbar(r, v_obs, yerr=v_err, fmt="ko", ms=4, capsize=2, label="Observed")
ax1.plot(r, np.abs(v_gas), "--", lw=1.2, label="Gas component")
ax1.plot(r, v_disk_eff, "--", lw=1.2, label="Disk component (weighted)")
ax1.plot(r, v_bul_eff, "--", lw=1.2, label="Bulge component (weighted)")
ax1.plot(r, v_bar, lw=2.0, label="Baryonic total")
ax1.plot(r, v_rar, lw=2.2, label="RAR total")
ax1.plot(r, v_abs_total, lw=2.2, label="Gaia ABS support")
ax1.set_xlabel("Radius [kpc]")
ax1.set_ylabel("Velocity [km/s]")
ax1.set_title(f"{gal_name} — smooth Gaia ABS support vs RAR")
ax1.legend(loc="best")
plt.tight_layout()

# Figure 2: correction comparison
fig2, axes2 = plt.subplots(2, 1, figsize=(10, 8), sharex=True)

axes2[0].plot(r, delta_g_required, "ko-", l="Required correction = g_obs - g_bar")
axes2[0].plot(r, delta_g_rar, "o-", label="RAR correction")
axes2[0].plot(r, delta_g_abs, "o-", label="Gaia ABS support")
axes2[0].axhline(0, ls="--", c="gray")
axes2[0].set_ylabel("Extra acceleration")
axes2[0].set_title("Required correction vs Gaia ABS support vs RAR")
axes2[0].legend(loc="best")

axes2[1].plot(r_fine, smooth_abs_fine, label="ABS smooth envelope")
axes2[1].set_xlabel("Radius [kpc]")
axes2[1].set_ylabel("Normalised support shape")
axes2[1].set_title("Smooth Gaia ABS envelope")
axes2[1].legend(loc="best")

plt.tight_layout()

# Figure 3: residuals
fig3, axes3 = plt.subplots(2, 1, figsize=(10, 8), sharex=True)

axes3[0].plot(r, res_v_bar, "o-", label="Baryonic only")
axes3[0].plot(r, res_v_rar, "o-", label="RAR total")
axes3[0].plot(r, res_v_abs, "o-", label="Gaia ABS support")
axes3[0].axhline(0, ls="--", c="gray")
axes3[0].set_ylabel("Velocity residual [km/s]")
axes3[0].set_title("Velocity residuals")
axes3[0].legend(loc="best")

axes3[1].plot(r, res_g_bar, "o-", label="Baryonic only")
axes3[1].plot(r, res_g_rar, "o-", label="RAR total")
axes3[1].plot(r, res_g_abs, "o-", label="Gaia ABS support")
axes3[1].axhline(0, ls="--", c="gray")
axes3[1].set_xlabel("Radius [kpc]")
axes3[1].set_ylabel("Acceleration residual")
axes3[1].set_title("Acceleration residuals")
axes3[1].legend(loc="best")

plt.tight_layout()
plt.show()

# ------------------------------------------------------------
# 9) Save outputs
# ------------------------------------------------------------
base = f"{gal_name}_smooth_gaia_abs_support_test"

fig1_png = f"{base}_curves.png"
fig1_pdf = f"{base}_curves.pdf"
fig2_png = f"{base}_supports.png"
fig2_pdf = f"{base}_supports.pdf"
fig3_png = f"{base}_residuals.png"
fig3_pdf = f"{base}_residuals.pdf"

fig1.savefig(fig1_png, bbox_inches="tight")
fig1.savefig(fig1_pdf, bbox_inches="tight")
fig2.savefig(fig2_png, bbox_inches="tight")
fig2.savefig(fig2_pdf, bbox_inches="tight")
fig3.savefig(fig3_png, bbox_inches="tight")
fig3.savefig(fig3_pdf, bbox_inches="tight")

out_df = pd.DataFrame({
    "r_kpc": r,
    "v_obs_kms": v_obs,
    "eV_kms": v_err,
    "v_gas_kms": v_gas,
    "v_disk_raw_kms": v_disk,
    "v_bul_raw_kms": v_bul,
    "v_disk_eff_kms": v_disk_eff,
    "v_bul_eff_kms": v_bul_eff,
    "v_bar_kms": v_bar,
    "v_rar_kms": v_rar,
    "v_abs_total_kms": v_abs_total,
    "g_obs": g_obs,
    "g_bar": g_bar,
    "delta_g_required": delta_g_required,
    "delta_g_rar": delta_g_rar,
    "delta_g_abs": delta_g_abs,
    "support_abs_template": support_abs_template,
    "res_v_bar": res_v_bar,
    "res_v_rar": res_v_rar,
    "res_v_abs": res_v_abs,
    "res_g_bar": res_g_bar,
    "res_g_rar": res_g_rar,
    "res_g_abs": res_g_abs,
})
csv_name = f"{base}_table.csv"
out_df.to_csv(csv_name, index=False)

metrics_df = pd.DataFrame([{
    "galaxy": gal_name,
    "yd_fixed": yd_fixed,
    "yb_fixed": yb_fixed,
    "gdag_fixed": gdag_fixed,
    "Rmax_kpc": Rmax,
    "lambda_fraction": lambda_fraction,
    "lambda_eff_kpc": lambda_eff,
    "smooth_scale_kpc": smooth_scale_kpc,
    "A_abs": A_abs,
    "rms_v_bar": rms(res_v_bar),
    "rms_v_rar": rms(res_v_rar),
    "rms_v_abs": rms(res_v_abs),
    "rms_g_bar": rms(res_g_bar),
    "rms_g_rar": rms(res_g_rar),
    "rms_g_abs": rms(res_g_abs),
}])
metrics_csv = f"{base}_metrics.csv"
metrics_df.to_csv(metrics_csv, index=False)

# Save code snapshot
ip = get_ipython()
input_history = ip.user_ns.get("In", [])
current_cell_code = input_history[-1] if len(input_history) >= 1 else "# Current cell code not found."
py_name = f"{base}.py"
with open(py_name, "w", encoding="utf-8") as f:
    f.write(current_cell_code)

zip_name = f"{base}_bundle.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fn in [
        fig1_png,
        fig1_pdf,
        fig2_png,
        fig2_pdf,
        fig3_png,
        fig3_pdf,
        csv_name,
        metrics_csv,
        py_name,
    ]:
        if os.path.exists(fn):
            zf.write(fn)

print("\nSaved:")
for fn in [
    fig1_png,
    fig1_pdf,
    fig2_png,
    fig2_pdf,
    fig3_png,
    fig3_pdf,
    csv_name,
    metrics_csv,
    py_name,
    zip_name,
]:
    if os.path.exists(fn):
        print(" -", fn)

files.download(zip_name)

✅✅✅ Output

(See figures 80-89)

⸻

9.9 SPARC Galaxy Test — Gaia ABS Support on the RAR Plane

SPARC GALAXY TEST — GAIA ABS SUPPORT ON THE RAR PLANE

# ============================================================
# SPARC GALAXY TEST — GAIA ABS SUPPORT ON THE RAR PLANE
# FULL CORRECTED CELL
#
# Goal:
#   Plot the galaxy on the McGaugh-style RAR plane:
#       x = log10(g_bar)
#       y = log10(g_obs)
#
#   and compare against:
#       y = log10(g_RAR)
#       y = log10(g_GaiaABS)
#
# Models:
#   1) Baryonic acceleration g_bar
#   2) Observed acceleration g_obs
#   3) Canonical RAR acceleration g_rar
#   4) Gaia ABS support acceleration g_gaia = g_bar + delta_g_abs
#
# Gaia scale locked from MW:
#   lambda_fraction = 5.4 / 30.0
#
# This cell:
#   - forces file upload
#   - reads one SPARC rotmod file
#   - builds the smooth Gaia ABS support model
#   - makes a McGaugh-style RAR-plane figure
#   - saves PNG + PDF + CSV + PY + ZIP
# ============================================================

import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.ndimage import gaussian_filter1d
from IPython import get_ipython
from google.colab import files

# ------------------------------------------------------------
# 0) FORCE upload prompt
# ------------------------------------------------------------
print("Upload one SPARC rotmod file now (for example: NGC3109_rotmod.dat)")
uploaded = files.upload()

if len(uploaded) == 0:
    raise ValueError("No file uploaded.")

rotmod_path = list(uploaded.keys())[0]
print(f"Using file: {rotmod_path}")

gal_name = os.path.basename(rotmod_path).replace("_rotmod.dat", "").replace(".dat", "")

# ------------------------------------------------------------
# 1) Read SPARC rotmod robustly
# ------------------------------------------------------------
rows = []
with open(rotmod_path, "r", encoding="utf-8", errors="ignore") as f:
    for line in f:
        s = line.strip()
        if (not s) or s.startswith("#"):
            continue

        parts = s.replace(",", " ").split()
        nums = []
        for p in parts:
            try:
                nums.append(float(p))
            except:
                pass

        if len(nums) >= 6:
            if len(nums) < 8:
                nums = nums + [np.nan] * (8 - len(nums))
            rows.append(nums[:8])

if len(rows) < 5:
    raise ValueError("Could not parse enough numeric rows from uploaded file.")

df = pd.DataFrame(
    rows,
    columns=[
        "R_kpc",
        "Vobs_kms",
        "eV_kms",
        "Vgas_kms",
        "Vdisk_kms",
        "Vbul_kms",
        "SBdisk",
        "SBbul",
    ],
)

df = df[np.isfinite(df["R_kpc"]) & np.isfinite(df["Vobs_kms"]) & (df["R_kpc"] > 0)].copy()
df = df.sort_values("R_kpc").reset_index(drop=True)

df["eV_kms"] = df["eV_kms"].replace(0, np.nan)
if df["eV_kms"].dropna().empty:
    df["eV_kms"] = 5.0
else:
    df["eV_kms"] = df["eV_kms"].fillna(df["eV_kms"].median())

for col in ["Vgas_kms", "Vdisk_kms", "Vbul_kms"]:
    df[col] = df[col].fillna(0.0)

r = df["R_kpc"].to_numpy()
v_obs = df["Vobs_kms"].to_numpy()
v_err = df["eV_kms"].to_numpy()
v_gas = df["Vgas_kms"].to_numpy()
v_disk = df["Vdisk_kms"].to_numpy()
v_bul = df["Vbul_kms"].to_numpy()

Rmin = float(np.min(r))
Rmax = float(np.max(r))

# ------------------------------------------------------------
# 2) Baryonic-only curve from SPARC components
# Fixed M/L values
# ------------------------------------------------------------
yd_fixed = 0.5
yb_fixed = 0.7

v_disk_eff = np.sqrt(np.clip(yd_fixed * v_disk**2, 0.0, None))
v_bul_eff = np.sqrt(np.clip(yb_fixed * v_bul**2, 0.0, None))
v_bar2 = np.abs(v_gas) * v_gas + yd_fixed * (v_disk**2) + yb_fixed * (v_bul**2)
v_bar = np.sqrt(np.clip(v_bar2, 0.0, None))

# ------------------------------------------------------------
# 3) Accelerations in SPARC internal units: (km/s)^2 / kpc
# ------------------------------------------------------------
g_obs = v_obs**2 / r
g_bar = v_bar**2 / r
delta_g_required = g_obs - g_bar

# ------------------------------------------------------------
# 4) Canonical fixed RAR
# gdag = 1.20e-10 m/s^2 = ~3702.84 (km/s)^2/kpc
# ------------------------------------------------------------
gdag_fixed = 3.0857e19 * 1.20e-10 / 1e6

def rar_relation(gbar, gdag):
    x = np.clip(gbar / gdag, 1e-12, None)
    return gbar / (1.0 - np.exp(-np.sqrt(x)))

g_rar = rar_relation(g_bar, gdag_fixed)
v_rar = np.sqrt(np.clip(g_rar * r, 0.0, None))
delta_g_rar = g_rar - g_bar

# ------------------------------------------------------------
# 5) Smooth Gaia ABS support envelope
# ------------------------------------------------------------
lambda_fraction = 5.4 / 30.0
lambda_eff = lambda_fraction * Rmax

n_fine = 3000
r_fine = np.linspace(Rmin, Rmax, n_fine)
theta_fine = 2.0 * np.pi * r_fine / lambda_eff

raw_abs_fine = np.abs(np.sin(theta_fine))

dr_fine = r_fine[1] - r_fine[0]
smooth_scale_kpc = max(lambda_eff / 4.0, 2.0 * dr_fine)
sigma_samples = max(smooth_scale_kpc / dr_fine, 1.0)

smooth_abs_fine = gaussian_filter1d(raw_abs_fine, sigma=sigma_samples, mode="nearest")

peak_abs = np.max(smooth_abs_fine)
if peak_abs > 0:
    smooth_abs_fine = smooth_abs_fine / peak_abs
else:
    smooth_abs_fine = np.ones_like(smooth_abs_fine)

support_abs_template = np.interp(r, r_fine, smooth_abs_fine)

# Fit one amplitude directly
denom = np.sum(support_abs_template**2)
if denom <= 0:
    A_abs = 0.0
else:
    A_abs = max(0.0, np.sum(support_abs_template * delta_g_required) / denom)

delta_g_abs = A_abs * support_abs_template
g_abs = g_bar + delta_g_abs
v_abs = np.sqrt(np.clip(g_abs * r, 0.0, None))

# ------------------------------------------------------------
# 6) Convert to McGaugh-style SI units for the RAR plane
#     1 (km/s)^2/kpc = 3.24078e-14 m/s^2
# ------------------------------------------------------------
conv_to_mps2 = 3.24078e-14

g_bar_si = g_bar * conv_to_mps2
g_obs_si = g_obs * conv_to_mps2
g_rar_si = g_rar * conv_to_mps2
g_abs_si = g_abs * conv_to_mps2

# keep only positive finite values for logs
mask = (
    np.isfinite(g_bar_si) & np.isfinite(g_obs_si) &
    np.isfinite(g_rar_si) & np.isfinite(g_abs_si) &
    (g_bar_si > 0) & (g_obs_si > 0) &
    (g_rar_si > 0) & (g_abs_si > 0)
)

x = np.log10(g_bar_si[mask])
y_obs = np.log10(g_obs_si[mask])
y_rar = np.log10(g_rar_si[mask])
y_abs = np.log10(g_abs_si[mask])

order = np.argsort(x)
x_s = x[order]
y_obs_s = y_obs[order]
y_rar_s = y_rar[order]
y_abs_s = y_abs[order]

x_line = np.linspace(min(x_s.min(), -13.5), max(x_s.max(), -9.0), 300)
y_unity = x_line

# ------------------------------------------------------------
# 7) Metrics
# ------------------------------------------------------------
def rms(arr):
    return float(np.sqrt(np.mean(arr**2)))

res_v_bar = v_obs - v_bar
res_v_rar = v_obs - v_rar
res_v_abs = v_obs - v_abs

res_g_bar = g_obs - g_bar
res_g_rar = g_obs - g_rar
res_g_abs = g_obs - g_abs

rar_plane_rms_rar = rms(y_obs_s - y_rar_s)
rar_plane_rms_abs = rms(y_obs_s - y_abs_s)

print(f"\n=== {gal_name} — GAIA ABS SUPPORT ON THE RAR PLANE ===")
print(f"Disk M/L fixed      : {yd_fixed:.4f}")
print(f"Bulge M/L fixed     : {yb_fixed:.4f}")
print(f"RAR gdag fixed      : {gdag_fixed:.4f} (km/s)^2/kpc")
print(f"Galaxy Rmax         : {Rmax:.4f} kpc")
print(f"Gaia lambda fraction: {lambda_fraction:.6f}")
print(f"Gaia-locked lambda  : {lambda_eff:.4f} kpc")
print(f"Gaussian smoothing  : {smooth_scale_kpc:.4f} kpc")
print(f"A_abs               : {A_abs:.4f}")

print("\nVelocity RMS:")
print(f"  Baryonic only     : {rms(res_v_bar):.3f}")
print(f"  RAR total         : {rms(res_v_rar):.3f}")
print(f"  Gaia ABS support  : {rms(res_v_abs):.3f}")

print("\nAcceleration RMS:")
print(f"  Baryonic only     : {rms(res_g_bar):.3f}")
print(f"  RAR total         : {rms(res_g_rar):.3f}")
print(f"  Gaia ABS support  : {rms(res_g_abs):.3f}")

print("\nRAR-plane RMS in log10(m/s^2):")
print(f"  RAR line          : {rar_plane_rms_rar:.4f}")
print(f"  Gaia ABS line     : {rar_plane_rms_abs:.4f}")

# ------------------------------------------------------------
# 8) PLOTS
# ------------------------------------------------------------
plt.rcParams.update({
    "figure.dpi": 140,
    "savefig.dpi": 300,
    "axes.grid": True,
})

# Figure 1: McGaugh-style RAR plane
fig1, ax1 = plt.subplots(figsize=(8, 7))
ax1.plot(x_line, y_unity, ":", color="gray", label="Unity ($g_{obs}=g_{bar}$)")
ax1.scatter(x, y_obs, color="black", s=24, label="Observed points")
ax1.plot(x_s, y_rar_s, color="tab:purple", lw=2.2, label="RAR prediction")
ax1.plot(x_s, y_abs_s, color="tab:brown", lw=2.2, label="Gaia ABS prediction")
ax1.set_xlabel(r"$\log_{10}(g_{\rm bar}\ [{\rm m\ s^{-2}}])$")
ax1.set_ylabel(r"$\log_{10}(g\ [{\rm m\ s^{-2}}])$")
ax1.set_title(f"{gal_name} — RAR plane: observed vs RAR vs Gaia ABS")
ax1.legend(loc="best")
plt.tight_layout()

# Figure 2: velocity curves
fig2, ax2 = plt.subplots(figsize=(10, 6))
ax2.errorbar(r, v_obs, yerr=v_err, fmt="ko", ms=4, capsize=2, label="Observed")
ax2.plot(r,v_bar, lw=2.0, label="Baryonic total")
ax2.plot(r, v_rar, lw=2.2, label="RAR total")
ax2.plot(r, v_abs, lw=2.2, label="Gaia ABS support")
ax2.set_xlabel("Radius [kpc]")
ax2.set_ylabel("Velocity [km/s]")
ax2.set_title(f"{gal_name} — velocity curves")
ax2.legend(loc="best")
plt.tight_layout()

# Figure 3: correction comparison + residuals
fig3, axes3 = plt.subplots(3, 1, figsize=(10, 11), sharex=True)

axes3[0].plot(r, delta_g_required, "ko-", label="Required correction = g_obs - g_bar")
axes3[0].plot(r, delta_g_rar, "o-", label="RAR correction")
axes3[0].plot(r, delta_g_abs, "o-", label="Gaia ABS support")
axes3[0].axhline(0, ls="--", c="gray")
axes3[0].set_ylabel("Extra acceleration")
axes3[0].set_title("Required correction vs RAR vs Gaia ABS")
axes3[0].legend(loc="best")

axes3[1].plot(r, res_v_bar, "o-", label="Baryonic only")
axes3[1].plot(r, res_v_rar, "o-", label="RAR total")
axes3[1].plot(r, res_v_abs, "o-", label="Gaia ABS support")
axes3[1].axhline(0, ls="--", c="gray")
axes3[1].set_ylabel("Velocity residual [km/s]")
axes3[1].set_title("Velocity residuals")
axes3[1].legend(loc="best")

axes3[2].plot(r, res_g_bar, "o-", label="Baryonic only")
axes3[2].plot(r, res_g_rar, "o-", label="RAR total")
axes3[2].plot(r, res_g_abs, "o-", label="Gaia ABS support")
axes3[2].axhline(0, ls="--", c="gray")
axes3[2].set_xlabel("Radius [kpc]")
axes3[2].set_ylabel("Acceleration residual")
axes3[2].set_title("Acceleration residuals")
axes3[2].legend(loc="best")

plt.tight_layout()
plt.show()

# ------------------------------------------------------------
# 9) Save outputs
# ------------------------------------------------------------
base = f"{gal_name}_rar_plane_gaia_abs_test"

fig1_png = f"{base}_rar_plane.png"
fig1_pdf = f"{base}_rar_plane.pdf"
fig2_png = f"{base}_velocity.png"
fig2_pdf = f"{base}_velocity.pdf"
fig3_png = f"{base}_residuals.png"
fig3_pdf = f"{base}_residuals.pdf"

fig1.savefig(fig1_png, bbox_inches="tight")
fig1.savefig(fig1_pdf, bbox_inches="tight")
fig2.savefig(fig2_png, bbox_inches="tight")
fig2.savefig(fig2_pdf, bbox_inches="tight")
fig3.savefig(fig3_png, bbox_inches="tight")
fig3.savefig(fig3_pdf, bbox_inches="tight")

out_df = pd.DataFrame({
    "r_kpc": r,
    "v_obs_kms": v_obs,
    "eV_kms": v_err,
    "v_bar_kms": v_bar,
    "v_rar_kms": v_rar,
    "v_gaia_abs_kms": v_abs,
    "g_bar_sp": g_bar,
    "g_obs_sp": g_obs,
    "g_rar_sp": g_rar,
    "g_gaia_abs_sp": g_abs,
    "g_bar_mps2": g_bar_si,
    "g_obs_mps2": g_obs_si,
    "g_rar_mps2": g_rar_si,
    "g_gaia_abs_mps2": g_abs_si,
    "delta_g_required": delta_g_required,
    "delta_g_rar": delta_g_rar,
    "delta_g_abs": delta_g_abs,
    "res_v_bar": res_v_bar,
    "res_v_rar": res_v_rar,
    "res_v_abs": res_v_abs,
    "res_g_bar": res_g_bar,
    "res_g_rar": res_g_rar,
    "res_g_abs": res_g_abs,
})
csv_name = f"{base}_table.csv"
out_df.to_csv(csv_name, index=False)

metrics_df = pd.DataFrame([{
    "galaxy": gal_name,
    "yd_fixed": yd_fixed,
    "yb_fixed": yb_fixed,
    "gdag_fixed_sp": gdag_fixed,
    "Rmax_kpc": Rmax,
    "lambda_fraction": lambda_fraction,
    "lambda_eff_kpc": lambda_eff,
    "smooth_scale_kpc": smooth_scale_kpc,
    "A_abs": A_abs,
    "rms_v_bar": rms(res_v_bar),
    "rms_v_rar": rms(res_v_rar),
    "rms_v_abs": rms(res_v_abs),
    "rms_g_bar": rms(res_g_bar),
    "rms_g_rar": rms(res_g_rar),
    "rms_g_abs": rms(res_g_abs),
    "rar_plane_rms_rar_log10": rar_plane_rms_rar,
    "rar_plane_rms_gaia_abs_log10": rar_plane_rms_abs,
}])
metrics_csv = f"{base}_metrics.csv"
metrics_df.to_csv(metrics_csv, index=False)

# Save code snapshot
ip = get_ipython()
input_history = ip.user_ns.get("In", [])
current_cell_code = input_history[-1] if len(input_history) >= 1 else "# Current cell code not found."
py_name = f"{base}.py"
with open(py_name, "w", encoding="utf-8") as f:
    f.write(current_cell_code)

zip_name = f"{base}_bundle.zip"
with zipfile.ZipFile(zip_name, "w", compression=zipfile.ZIP_DEFLATED) as zf:
    for fn in [
        fig1_png,
        fig1_pdf,
        fig2_png,
        fig2_pdf,
        fig3_png,
        fig3_pdf,
        csv_name,
        metrics_csv,
        py_name,
    ]:
        if os.path.exists(fn):
            zf.write(fn)

print("\nSaved:")
for fn in [
    fig1_png,
    fig1_pdf,
    fig2_png,
    fig2_pdf,
    fig3_png,
    fig3_pdf,
    csv_name,
    metrics_csv,
    py_name,
    zip_name,
]:
    if os.path.exists(fn):
        print(" -", fn)

files.download(zip_name)

✅✅✅

Outputs (see figures 90-l29

✅✅✅ 

Saving UGC06399_rotmod.dat to UGC06399_rotmod.dat
Using file: UGC06399_rotmod.dat

=== UGC06399 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 7.8500 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.4130 kpc
Gaussian smoothing  : 0.3533 kpc
A_abs               : 1026.9991

Velocity RMS:
  Baryonic only     : 38.707
  RAR total         : 4.066
  Gaia ABS support  : 4.791

Acceleration RMS:
  Baryonic only     : 931.933
  RAR total         : 177.650
  Gaia ABS support  : 225.056

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.0757
  Gaia ABS line     : 0.0964

✅✅✅ 

Using file: NGC0100_rotmod.dat

=== NGC0100 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 9.6200 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.7316 kpc
Gaussian smoothing  : 0.4329 kpc
A_abs               : 828.6873

Velocity RMS:
  Baryonic only     : 37.324
  RAR total         : 6.002
  Gaia ABS support  : 5.814

Acceleration RMS:
  Baryonic only     : 746.526
  RAR total         : 359.291
  Gaia ABS support  : 263.349

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1911
  Gaia ABS line     : 0.1787

✅✅✅ 

Saving DDO161_rotmod.dat to DDO161_rotmod.dat
Using file: DDO161_rotmod.dat

=== DDO161 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 13.3700 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 2.4066 kpc
Gaussian smoothing  : 0.6017 kpc
A_abs               : 324.1753

Velocity RMS:
  Baryonic only     : 23.323
  RAR total         : 18.512
  Gaia ABS support  : 3.893

Acceleration RMS:
  Baryonic only     : 267.278
  RAR total         : 494.195
  Gaia ABS support  : 74.475

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.3081
  Gaia ABS line     : 0.0743

✅✅✅ 

=== F574-1 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 12.6000 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 2.2680 kpc
Gaussian smoothing  : 0.5670 kpc
A_abs               : 1198.1769

Velocity RMS:
  Baryonic only     : 48.525
  RAR total         : 8.826
  Gaia ABS support  : 8.037

Acceleration RMS:
  Baryonic only     : 983.678
  RAR total         : 304.498
  Gaia ABS support  : 338.261

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1201
  Gaia ABS line     : 0.1434

✅✅✅ 

Saving UGC08490_rotmod.dat to UGC08490_rotmod.dat
Using file: UGC08490_rotmod.dat

=== UGC08490 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 10.1500 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.8270 kpc
Gaussian smoothing  : 0.4568 kpc
A_abs               : 1700.2084

Velocity RMS:
  Baryonic only     : 44.871
  RAR total         : 8.884
  Gaia ABS support  : 19.050

Acceleration RMS:
  Baryonic only     : 1487.536
  RAR total         : 591.643
  Gaia ABS support  : 678.212

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1160
  Gaia ABS line     : 0.1910

✅✅✅ 

Saving NGC6503_rotmod.dat to NGC6503_rotmod.dat
Using file: NGC6503_rotmod.dat

=== NGC6503 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 23.5000 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 4.2300 kpc
Gaussian smoothing  : 1.0575 kpc
A_abs               : 1396.5940

Velocity RMS:
  Baryonic only     : 56.834
  RAR total         : 3.993
  Gaia ABS support  : 22.893

Acceleration RMS:
  Baryonic only     : 1288.542
  RAR total         : 525.675
  Gaia ABS support  : 682.418

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.0375
  Gaia ABS line     : 0.1556

✅✅✅ 

Saving NGC4157_rotmod.dat to NGC4157_rotmod.dat
Using file: NGC4157_rotmod.dat

=== NGC4157 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 29.6100 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 5.3298 kpc
Gaussian smoothing  : 1.3325 kpc
A_abs               : 799.8699

Velocity RMS:
  Baryonic only     : 52.616
  RAR total         : 21.999
  Gaia ABS support  : 12.730

Acceleration RMS:
  Baryonic only     : 972.567
  RAR total         : 1330.680
  Gaia ABS support  : 671.231

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1084
  Gaia ABS line     : 0.0700

✅✅✅ 

Saving NGC4183_rotmod.dat to NGC4183_rotmod.dat
Using file: NGC4183_rotmod.dat

=== NGC4183 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 21.0200 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 3.7836 kpc
Gaussian smoothing  : 0.9459 kpc
A_abs               : 1247.9493

Velocity RMS:
  Baryonic only     : 48.411
  RAR total         : 10.274
  Gaia ABS support  : 17.989

Acceleration RMS:
  Baryonic only     : 1038.654
  RAR total         : 239.958
  Gaia ABS support  : 390.053

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.0814
  Gaia ABS line     : 0.1374

✅✅✅ 

Saving UGC07125_rotmod.dat to UGC07125_rotmod.dat
Using file: UGC07125_rotmod.dat

=== UGC07125 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 18.6800 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 3.3624 kpc
Gaussian smoothing  : 0.8406 kpc
A_abs               : 233.3041

Velocity RMS:
  Baryonic only     : 21.210
  RAR total         : 30.682
  Gaia ABS support  : 5.190

Acceleration RMS:
  Baryonic only     : 230.438
  RAR total         : 488.337
  Gaia ABS support  : 58.539

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.3599
  Gaia ABS line     : 0.0696

✅✅✅ 

Using file: UGC07089_rotmod.dat

=== UGC07089 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 9.1600 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.6488 kpc
Gaussian smoothing  : 0.4122 kpc
A_abs               : 466.0113

Velocity RMS:
  Baryonic only     : 25.464
  RAR total         : 15.008
  Gaia ABS support  : 3.769

Acceleration RMS:
  Baryonic only     : 446.562
  RAR total         : 462.868
  Gaia ABS support  : 77.929

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.2144
  Gaia ABS line     : 0.0503

✅✅✅ 

Saving UGC05829_rotmod.dat to UGC05829_rotmod.dat
Using file: UGC05829_rotmod.dat

=== UGC05829 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 6.9100 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.2438 kpc
Gaussian smoothing  : 0.3110 kpc
A_abs               : 506.5960

Velocity RMS:
  Baryonic only     : 24.754
  RAR total         : 8.557
  Gaia ABS support  : 2.427

Acceleration RMS:
  Baryonic only     : 474.925
  RAR total         : 205.792
  Gaia ABS support  : 51.557

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1162
  Gaia ABS line     : 0.0383

✅✅✅ 

=== UGCA444 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 2.6200 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 0.4716 kpc
Gaussian smoothing  : 0.1179 kpc
A_abs               : 532.4479

Velocity RMS:
  Baryonic only     : 14.471
  RAR total         : 4.905
  Gaia ABS support  : 1.660

Acceleration RMS:
  Baryonic only     : 420.617
  RAR total         : 222.555
  Gaia ABS support  : 65.911

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1442
  Gaia ABS line     : 0.0598

✅✅✅ 

Saving UGC08286_rotmod.dat to UGC08286_rotmod.dat
Using file: UGC08286_rotmod.dat

=== UGC08286 — GAIA ABS SUPPORT ON THE RAR PLANE ===
Disk M/L fixed      : 0.5000
Bulge M/L fixed     : 0.7000
RAR gdag fixed      : 3702.8400 (km/s)^2/kpc
Galaxy Rmax         : 8.0400 kpc
Gaia lambda fraction: 0.180000
Gaia-locked lambda  : 1.4472 kpc
Gaussian smoothing  : 0.3618 kpc
A_abs               : 1462.6305

Velocity RMS:
  Baryonic only     : 44.677
  RAR total         : 11.205
  Gaia ABS support  : 10.159

Acceleration RMS:
  Baryonic only     : 1361.554
  RAR total         : 588.198
  Gaia ABS support  : 459.809

RAR-plane RMS in log10(m/s^2):
  RAR line          : 0.1569
  Gaia ABS line     : 0.1247


⸻

10. Notes

	•	This document represents an early-stage, timestamped result
	•	Further statistical validation and formal analysis to follow
